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Read This First 



The purpose of this user's guide is to serve as a reference book for the 
TMS320C40 and TMS320C40-40 digital signal processors. Throughout the 
book, all references to the TMS320C40 apply to the TMS320C40-40 as 
well, unless an exception is noted. This document provides information to 
assist managers and hardware/software engineers in application develop- 
ment. 



How to Use This Manual 



This document contains the following chapters: 

Chapter 1 Introduction 

A general description of the TMS320C40, its key features, and typical appli- 
cations. 

Chapter 2 Architectural Overview 

Functional block diagrams. TMS320C40 design description, hardware 
components, and device operation. Instruction set summary. 

Chapter 3 CPU Registers, Memory, and Cache 

Description of the registers in the CPU primary register file and expansion 
register file. Memory maps. Instruction cache architecture, algorithm, and 
control bits. 

Chapter 4 Data Formats and Floating-Point Operation 

Description of signed and unsigned integer and floating-point formats. Dis- 
cussion of floating-point multiplication, addition, subtraction, normalization, 
rounding, conversions, and reciprocals. 

Chapter 5 Addressing 

Addressing types. Operation, encoding, and implementation of addressing 
modes. Format descriptions. Circular and bit-reversed addressing. System 
stack management. 



iii 



Preface — Read This First 

Chapter 6 Program Flow Control 

Software control of program flow using repeat modes, different types of 
branching, traps, interrupts, and interlocked operations. Reset operation, 
including resulting values in registers and on pins. 

Chapter 7 External Bus Operation 

Discussion of the two 80-pin local and global memory interfaces. 
Programmable wait-states. Memory access timing. Signal group control. 
Interlocked instructions. Interrupt acknowledge timing. 

Chapter 8 Communication Ports 

Description of the six, bidirectional, 1 60-megabit-per-second (at 40-ns 
cycle time) communication ports designed for sharing tasks between 
processors. Memory maps of the ports and their registers. Port operation 
and coordination of port activity with CPU and DMA coprocessors. 

Chapter 9 DMA Coprocessors and 'C40 Timers 

DMA coprocessor operation. Description of coprocessor registers (channel 
control, channel address, index, transfer count, and link pointer). Use in 
unified and split mode. Priority and CPU/DMA arbitration. Autoinitialization 
and interrupts. Operation of the 'C40 timers; their registers (global control, 
timer counter, and period). 

Chapter 10 Pipeline Operation 

Discussion of 'C40 pipeline operations. This includes pipeline conflicts and 
methods for resolving these. Clocking of memory accesses. 

Chapter 11 Assembly Language Instructions 

Functional listing of instructions. Condition code definitions (for conditional 
instructions such as branch conditional). Alphabetized individual instruction 
descriptions with examples. 

Chapter 12 Software Applications 

Software application examples for using various TMS320C40 
instruction-set and programming features. Code listings enhance 
explanations. 

Chapter 13 Hardware Applications 

Hardware design techniques and application examples for interfacing to 
memories, peripherals, or other microcomputers/microprocessors. Code 
listings, schematics, and timing diagrams facilitate explanations. 

Chapter 14 TMS320C4x Signal Descriptions and Electrical Characteristics 

Pin locations and pin descriptions. 'C40 dimensions and package 
description. Electrical characteristics. Signal timing and characteristics. 
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Appendix A TMS320C40 Sockets 

Two sockets available for the TMS320C40. 

Appendix B XDS510 Design Considerations 

Considerations for designing your TMS320C40 target system for use with 
the XDS510 emulator. 
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Style and Symbol Conventions 

This document uses the following conventions: 

□ Program listings, program examples, interactive displays, file names, 
and symbol names are shown in a special font. Examples use a bold 
version of the special font for emphasis. Here is a sample program list- 
ing: 

0011 0005 0001 .field 1, 2 

0012 0005 0003 .field 3, 4 

0013 0005 0006 .field 6, 3 

0014 0006 .even 

□ In syntax descriptions, the instruction, command, or directive is in a 
bold face font and parameters are in italics. Portions of a syntax that 
are in bold face should be entered as shown; portions of a syntax that 
are in italics describe the type of information that should be entered. 
Here is an example of an instruction: 

CMPF3 src2,src3 

Note: Although the instruction mnemonic (CMPF3 in this example) is in 
capital letters, the 'C40 assembler is not case sensitive — it can 
assemble mnemonics entered in either upper or lower case. 

CMPF3 is the instruction mnemonic. This instruction has two 
parameters, indicated by src2 and src3. 
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□ Square brackets ( [ and ] ) identify an optional parameter. If you use an 
optional parameter, you must specify the information within the 
brackets; however, you don't enter the brackets themselves. Here's an 
example of an instruction that has an optional parameter: 

LDP $rc[,DP] 

The LRP instruction is shown with two parameters; one is optional. The 
first parameter, src, is required. The second parameter, DP, is optional. 
As this syntax shows, if you use the optional second parameter, you 
must precede it with a comma. 

□ Braces ( { and } ) indicate a list. The symbol | (read as of) separates 
items within the list. Here's an example of a list: 

{*!*+!*_} 

This provides three choices: *, *+, or *-. 

Unless the list is enclosed in square brackets, you must choose one 
item from the list. 

□ The following is the format for a varying number of parameters. For ex- 
ample, the .byte directive can have up to 1 00 parameters. The syntax 
for this directive is 

-byte valuei /",..., value n ] 

This syntax shows that .byte must have at least one value parameter, 
but you have the option of supplying additional value parameters sepa- 
rated by commas. 



viii 



Preface 



Preface — Read This First 



Information About Cautions and Warnings 

□ A caution describes a situation that could potentially damage your 
software or equipment. 




□ A warning describes a situation that could potentially cause harm to 
you. 



This is what a warning looks like. 



Please read each caution or warning carefully. The information is provided 
for your protection. 



Trademarks 

ABEL is a trademark of the Data I/O Corporation. 

SPOX is a trademark of Spectron Microsystems, Inc. 
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Chapter 1 



Introduction 




Texas Instruments' TMS320C4x generation floating-point processors are 
designed specifically to meet the needs of parallel processing and other 
real-time embedded applications. TMS320C4x products consist of both 
parallel processing devices and development tools. With world-class 
parallel-processing development tools, designers are able to fully utilize the 
immense performance of 275 MOPS (millions of operations per second) 
and 320 Mbytes per second throughput made available by the TMS320C4x 
generation. 

This chapter provides a brief overview of the TMS320C4x generation. Major 



topics covered are as follows: 

Section Page 

1.1 The TMS320 Family 1-2 

1 .2 Parallel Processing . . 1-3 

1.3 The TMS320C40x Generation 1-4 

1.4 Applications 1-11 



The TMS320 Family 



1.1 The TMS320 Family 



The TMS320C4x is one of five generations in the TMS320 family of digital 
signal processors. The TMS320C1x, TMS320C2x, and TMS320C5x offer 
designers acomplete line of general-purpose and application-specific fixed- 
point DSPs. The TMS320C3X and TMS320C4x generations round out the 
TMS320 family, providing an ensemble of floating-point DSPs. The 
TMS320 family has blossomed from a single device introduced in 1 982, the 
TMS3201 0, to nearly thirty different products across five CPU architectures. 
On-chip hardware multipliers, register files, barrel shifters, ALUs, ROM, 
RAM, caches, and I/O peripherals along with massive internal busing (all 
within a product as programmable as a general-purpose microprocessor), 
make Tl's TMS320 devices ideal for the gamut of computer-intensive appli- 
cations. 



Figure 1-1. 
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Parallel Processing 

The need for parallel processing is quickly growing. As floating-point per- 
formance requirements grow exponentially, semiconductor manufacturers 
can no longer meet the need with single processing elements. Processors 
not designed for parallel processing are inadequate for the task, as interpro- 
cessor communication quickly saturates device I/O and adversely affects 
computing efficiency. Products in the TMS320C3x generation made the first 
step in addressing the need for parallel processing by providing designers 
with two external interface ports, each with a comprehensive memory inter- 
face. This yields an immense amount of I/O bandwidth. Devices in the 
TMS320C4x generation go several steps further by incorporating on-chip 
hardware to facilitate high-speed interprocessor communication and con- 
current I/O without degrading CPU performance. These features, coupled 
with a host of sophisticated parallel processing development tools, make 
the TMS320C4x generation of floating-point processors ideal for realtime 
embedded applications. 
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1.3 TMS320C4x Features 

The TMS320C4x generation consists of two equally important aspects, par- 
allel processing devices and parallel processing development tools. 

1 .3.1 TMS320C40 Device Key Features 

The Primary features of the TMS320C4x devices are: 

□ Six communication ports for highspeed interprocessor communication. 
Communication port key features include: 

■ 20-Mbytes/sec asynchronous transfer rate at each port for maxi- 
mum data throughput 

■ Direct (glueless) processor-to-processor communication for ease 
of use 

■ Bidirectional transfers for maximum communication flexibility 

□ Six-channel DMA coprocessor for concurrent I/O and CPU operation, 
thereby maximizing sustained CPU performance by alleviating the CPU 
of burdensome I/O. DMA coprocessor key features include: 

■ Concurrent data transfers and CPU operation for sustained CPU 
performance 

■ Self-programming (autoinitialize) capability for each channel, 
thereby not requiring the CPU for initialization, maximizing sus- 
tained CPU performance 

■ Data transfers to and from anywhere in the processor's memory 
map for maximum flexibility 

□ High-performance DSP CPU capable of 275 MOPS and 320 Mbytes/ 
sec. CPU key features include: 

■ Eleven operations per cycle throughput, resulting in massive com- 
puting parallelism and sustained CPU performance 

■ 40-ns and 50-ns instruction cycle times 

■ 40/32-bit single-cycle floating-point/integer multiplier for high per- 
formance in computationally intensive algorithms 

■ Single-cycle IEEE floating-point conversion for efficient interface to 
IEEE-compatible processors 

■ Hardware divide and inverse square root support for high perform- 
ance 

■ Byteand half-word manipulationcapabilitiesforfastdata (unpack- 
ing 
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■ Source code compatible with TMS320C3x generation for easy up- 
ward and downward mobility 

■ Support for linear, circular, and bit-reversed addressing for high 
performance 

■ Single-cycle branches, calls, and returns for fast program control 

■ Single-cycle barrel shifter for 0-31 single-cycle right or left shifts for 
fast bit manipulation 

■ Relocatable reset and interrupt vectors for easy integration into 
parallel processing systems 

□ Two identical external data and address buses supporting shared 
memory system? and high data rate, single-cycle transfers. Key fea- 
tures include: 

■ High port data-transfer rate of 1 00 Mbytes/sec 

■ 1 6-Gbyte continuous program/data/peripheral address space for 
maximum design flexibility 

■ Status pins that signal type of memory access requested for fast, 
intelligent bus arbitration in shared memory systems 

■ Separate address, data, and control-enable pins for high-speed 
bus arbitration 

■ Four sets of memory-control signals support different speed 
memories in hardware, enabling efficient use of low- and high- 
speed memories 

□ On-chip analysis module supporting efficient, state of the art parallel 
processing debug. Key features include: 

■ Separate breakpoint comparators for program, data, and DMA ac- 
cesses, providing onchip hardware breakpoint capabilities for fast 
debug and development 

■ Discontinuity stack for hardware trace, facilitating fast debug and 
development 

■ Event counter for accurate benchmarking and profiling 

■ JTAG interface for standard system connection 



1-5 



TMS320C4x Features 



□ On-chip program cache and dual-access/single-cycle RAM for in- 
creased memory access performance. On-chip memory key features 
include: 

■ 512-byte instruction cache for increased system performance 

■ 8K-bytes of single-cycle dual access program or data RAM for in- 
creased system performance and lower system cost 

■ Bootloader (ROM based) supporting program bootup via 8-, 1 6- or 
32-bit memories over any one of the communication ports 

□ Separate internal program, data, and DMA coprocessor buses for sup- 
port of massive concurrent I/O of program and data throughput, thereby 
maximizing sustained CPU performance. 

Summed up, the total device performance is 275 MOPS and 320 Mbytes/ 
sec as noted below. 



Sustained Computation: 

• DMA Coprocessor 

• High-Performance CPU 



TMS320C40 Performance 

Sustained I/O: 

• Communication Ports 

• DMA Coprocessor 

• Global and Local Buses 

40-ns 

Cycle Time 

— ^ 



CPU and DMA PERFORMANCE 

CPU - 8 OPS/Cycle = 200 MOPS 

• 2 Data Accesses 60 MOPS 

• 1 FP Multiply 25 MOPS 

• 1 FP ALU Operation 25 MOPS 

• 2 Addr. Register Mods 60 MOPS 

• 1 Loop Counter Update 25 MOPS 

• 1 Branch 25 MOPS 

DMA COPROCESSOR 

3 OPS/Cycle - 75 MOPS 

• 1 Data Access 25 MOPS 

• 1 Addr. Register Mods. 25 MOPS 

• 1 Transfer Counter 25 MOPS 

Update 

TOTAL MOPS = 275 MOPS 



DATA THROUGHPUT 

Global Port 100 Mbytes/sec 
Local Port 100 Mbytes/sec 
6 Com Ports 1 20 Mbytes/sec 

TOTAL I/O = 320Mbytes/sec 
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1.3.2 Communication Port Benefits 

Without the six communication ports, 120 Mbytes/sec of processor through- 
put must be squeezed over one or both of the external memory interfaces, 
thereby saturating processor throughput, likewise turning the system into 
a complex shared memory architecture. With the communication ports, 
bandwidth is plentiful (illustrated in Figure 1-2). 

Figure 1-2. TMS320C40 Throughput Increases Use of Communication Ports 
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1.3.3 DMA Coprocessor Benefits 

Without the DMA coprocessor, the CPU would have to use computational 
MOPS to transfer data within the processor's memory map. With the DMA 
coprocessor, the GPU can focus its entire 200 MOPS of performance on 
quality computational tasks while the DMA coprocessor takes care of the 
burdensome I/O. This is illustrated in Figure 1-3. 



Figure 1-3. TMS320C40 Throughput Increases Use of DMA Coprocessor 
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1.3.4 TMS320C40 Parallel Processing Development Tools Key Features 

The primary TMS320C4x development tools are as follows: 

□ Parallel processing in-circuit emulator (XDS51 0) 

■ Able to debug both C and assembly code simultaneously using the 
graphical user-interface based source-level debugger 

■ Can debug any number of TMS320C4x devices in a system with a 
single XDS51 0 controller card 

■ Can globally stop, start and single step all or any combination of 
'C40s in a system. 

□ Parallel processing development system 

■ Host-independent evaluation board with four 'C40s 

■ Each 'C40 connected to every other '040 via their communication 
ports, enabling designers to efficiently test different system 
topologies 

■ Interfaces directly to XDS510 emulator, creating a complete 
parallel processing development environment. 

□ Parallel processing optimizing ANSI C compiler 

■ Parallel runtime support library for easy implementation of data and 
message passing between tasks (or processors) in parallel 
processing systems 

■ C-source and target-specific optimizations for dense, optimal code 

■ Plum-Hall validated to ANSI standard for maximum code portability 

□ SPOX parallel processing DSP operating system 

■ Parallel processing support for easy message passing within a 
multitasking environment 

■ Communication port, DMA coprocessor, and memory interface 
drivers for fast development of C code without detailed knowledge 
of the hardware 

■ Multitasking real-time kernel for fast implementation of 
multitasking system 

■ DSP math library for fast development of DSP applications (using 
optimized assembly language routines) 

□ Parallel processing assembler/linker 

■ Directives to map program and data code on specific processors for 
fast integration and debug of parallel processing code 

■ Relocatable modules for maximum code flexibility 
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□ Hardware verification and full functional models 

■ Simulation of multiple 'C40's and associated logic for accurate 
development (via software simulation) of parallel processing 
systems 

■ Accurate simulation of device bus cycles and functional execution 
for fast development of product hardware 

■ Supports various workstation and PC environments 

□ State accurate simulator 

■ Provides cycle-by-cycle simulation of all aspects of the 
TMS320C4X 

■ Low-cost way to simulate key software kernels 

■ Supported on a host of workstation and PC platforms 
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1.4 Applications 

Below is a list of classical DSP applications along with a number of 
embedded real-time applications which need the computational 
performance offered by TMS320 devices. The real time performance, low 
device costs, and comprehensive development tools are the primary 
aspects that which make Texas Instruments TMS320 devices the preferred 
solution in the following applications: 



Figure 1-4. Matrix of TMS320 DSP Applications 
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Instrumentation 


Digital Filtering 
Convolution 
Correlation 
Hilbert Transforms 
Fast Fourier Transforms 
Adaptive Filtering 
Windowing 
Waveform Generation 


3-D Transformations Rendering 
Robot Vision 

Image Transmission/Compression 
Pattern Recognition 
Image Enhancement 
Homomorphic Processing 
Workstations 
Animation/Digital Map 


Spectrum Analysis 
Function Generation 
Pattern Matching 
Seismic Processing 
Transient Analysis 
Digital Filtering 
Phase-Locked Loops 


Voice/Speech 


Control 


Military 


Voice Mail 
Speech Vocoding 
Speech Recognition 
Speaker Verification 
Speech Enhancement 
Speech Synthesis 
Text-to-Speech 
Neural Networks 


Disk Control 
Servo Control 
Robot Control 
Laser Printer Control 
Engine Control 
Motor Control 
Kalman Filtering 


Secure Communications 
Radar Processing 
Sonar Processing 
Image Processing 
Navigation 
Missile Guidance 
Radio Frequency Modems 
Sensor Fusion 


Telecommunications 


Automotive 
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Digital PBXs 
Line Repeaters 
Channel Multiplexing 
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Adaptive Equalizers 
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FAX 
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Spread Spectrum 
Communications 
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The TMS320C40 , s high performance is achieved through the precision and 
wide dynamic range of the floating-point units, large on-chip memory, a high 
degree of parallelism, and the six-channel DMA coprocessor. Figure 2-1, 
beginning on the next page, is a block diagram of the TMS320C40. 

This chapter gives an architectural overview of the TMS320C40 processor. 



Major areas of discussion are listed below. 

Section Page 

2.1 Central Processing Unit (CPU) 2-4 

■ Floating-point/integer multiplier 2-4 

■ ALU for floating-point, integer, and logical operations . 2-4 

■ 32-bit barrel shifter 2-4 

■ Internal buses (CPU1/CPU2 and REG1/REG2) 2-4 

■ Auxiliary register arithmetic units (ARAUs) 2-6 

■ Primary register file 2-6 

■ CPU expansion register file 2-9 

2.2 Memory Organization) 2-10 

■ RAM, ROM, and cache 2-10 

■ Memory maps 2-12 

■ Memory addressing modes 2-15 

2.3 Instruction Set Summary , 2-16 

2.4 Internal Bus Operation 2-26 

2.5 External Bus Operation 2-27 

2.6 Peripherals 2-28 

■ Communication ports 2-29 

■ Direct memory access (DMA) coprocessor 2-29 

■ Timers 2-29 
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Figure 2-1. TMS320C40 Block Diagram 
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TMS320C40 Block Diagram 



Figure 2-1. TMS320C40 Block Diagram (Concluded) 
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2.1 Central Processing Unit (CPU) 

The TMS320C40 has a register-based CPU architecture. The CPU com- 
prises the following components: 

□ Floating-point/integer multiplier 

□ ALU for performing arithmetic: floating-point, integer, and logical opera- 
tions 

□ 32-bit barrel shifter 

□ Internal buses (CPU1/CPU2 and REG1/REG2) 

□ Auxiliary register arithmetic units (ARAUs) 

□ CPU register file 

Figure 2-2 shows the various CPU components that are discussed in the 
succeeding subsections. 

2.1.1 Multiplier 

The multiplier performs single-cycle multiplications on 32-bit integer and 
40-bit floating-point values. The TMS320C40 implementation of float- 
ing-point arithmetic allows for floating-point operations at fixed-point 
speeds via a 40-ns instruction cycle and a high degree of parallelism. To 
gain even higher throughput, you can use parallel instructions to perform a 
multiply and ALU operation in a single cycle. 

When the multiplier performs floating-point multiplication, the inputs are 
40-bit floating-point numbers, and the result is a 40-bit floating-point num- 
ber. When the multiplier performs integer multiplication, the input data is 32 
bits and yields either the 32 most significant bits or 32 least significant bits 
of the resulting 64-bit product. Refer to Chapter 4 for detailed information 
on data formats and floating-point operation. 

2.1 .2 Arithmetic Logic Unit (ALU) 

The ALU performs single-cycle operations on 32-bit integer, 32-bit logical, 
and 40-bit floating-point data, including single-cycle integer and float- 
ing-point conversions. Results of the ALU are always maintained in 32-bit 
integer or 40-bit floating-point formats. The barrel shifter is used to shift up 
to 32 bits left or right in a single cycle. 

Internal buses, CPU1/CPU2 and REG1/REG2, carry two operands from 
memory and two operands from the register file, thus allowing parallel multi- 
plies and adds/subtracts on four integer or floating-point operands in a 
single cycle. 
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Figure 2-2. Central Processing Unit (CPU) 
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2.1 .3 Auxiliary Register Arithmetic Units (ARAUs) 

Two auxiliary register arithmetic units (ARAUO and ARAU1) can generate 
two addresses in a single cycle. The ARAUs operate in parallel with the mul- 
tiplier and ALU. They support addressing with displacements, index regis- 
ters (IRO and IR1 ), and circular and bit-reversed addressing. Refer to Chap- 
ter 5 for a description of addressing modes. 

2.1.4 CPU Primary Register File 

The TMS320C40 primary register file provides 32 registers in a multiport 
register file that is tightly coupled to the CPU. Table 2-1 lists register names 
and functions, followed by the section number and page of each description. 
(The expansion register file is described in subsection 2.1 .5 on page 2-9.) 

All of the primary register file registers can be operated upon by the multipli- 
er and ALU, and can be used as general-purpose registers. However, the 
registers also have some special functions. For example, the 12 ex- 
tended-precision registers are especially suited for maintaining float- 
ing-point results. The eight auxiliary registers support a variety of indirect 
addressing modes and can be used as general-purpose 32-bit integer and 
logical registers. The remaining registers provide system functions such as 
addressing, stack management, processor status, interrupts, and block re- 
peat. Refer to Chapter 3 for detailed information on the CPU registers. Re- 
fer to Chapter 5 for register usage in addressing. 

The extended-precision registers (R0-R11) are capable of storing and 
supporting operations on 32-bit integer and 40-bit floating-point numbers. 
Any instruction that assumes the operands are floating-point numbers uses 
bits 39-0. If the operands are either signed or unsigned integers, only bits 
31-0 are used, and bits 39-32 remain unchanged. This is true for all shift 
operations. Refer to Chapter 4 for extended-precision register formats for 
floating-point and integer numbers. 

The 32-bit auxiliary registers (AR0-AR7) can be accessed by the CPU 
and modified by the two auxiliary register arithmetic units (ARAUs). The pri- 
mary function of the auxiliary registers is the generation of 32-bit addresses. 
They can also be used as loop counters or as 32-bit general-purpose regis- 
ters that can be modified by the multiplier and ALU. Refer to Chapter 5 for 
detailed information and examples of the use of auxiliary registers in ad- 
dressing. 
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Table 2-1. CPU Primary Registers 
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Auxiliary register 0 
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3.1.2 


3-5 


AR3 
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o-D 


AR4 


Auxiliary register 4 


3.1.2 


3-5 


AR5 


Auxiliary register 5 


3.1.2 


3-5 


AR6 


Auxiliary register 6 


3.1.2 


3-5 


AR7 


Auxiliary register 7 


3.1.2 


3-5 


DP 


1—/ CI I CI pciyv? pv_/l 1 


3.1.3 


3-5 


IRO 


Index register 0 


3.1.4 


3-5 


IR1 


Index register 1 


3.1.4 


3-5 


BK 


Block-size register 


3.1.5 


3-5 


SP 


System stack pointer 


3.1.6 


3-5 


ST 


Status register 


3.1.7 


3-5 


DIE 


DMA Coprocessor interrupt enable 


3.1.8 


3-8 


HE 


Internal-interrupt enable register 


3.1.9 


3-10 


IF 


IIOF flag register 


3.1.10 


3-12 


RS 


Repeat start address 


3.1.11 


3-14 


RE 


Repeat end address 


3.1.11 


3-14 


RC 


Repeat counter 


3.1.11 


3-14 



The data page pointer (DP) is a 32-bit register. The 1 6 LSBs of the data 
page pointer are used by the direct addressing mode as a pointer to the page 
of data being addressed. The 'C40 can address up to 64K pages, each page 
containing 64K words. The data page pointer is illustrated in Figure 5-1 
on page 5-4. 

The 32-bit index registers contain the value used by the auxiliary register 
arithmetic unit (ARAL!) to compute an indexed address. Refer to Chapter 
5 for examples of the use of index registers in addressing (see subsection 
5.1.3, page 5-5, and Section 5.4, page 5-30. 

The ARAL! uses the 32-bit block size register (BK) in circular addressing 
to specify the data block size. (Circular addressing is described in Section 
5.3 on page 5-25.) 



2-7 



CPU 



The system stack pointer (SP) is a 32-bit register that contains the ad- 
dress of the top of the system stack. The SP always points to the last ele- 
ment pushed onto the stack. A push performs a preincrement, and a pop 
performs a postdecrement of the system stack pointer. The SP is manipu- 
lated by interrupts, traps, calls, returns, and the PUSH and POP instruc- 
tions. Refer to Section 5.5, page 5-31 , for information about system stack 
management. 

The status register (ST) contains global information relating to the state 
of the CPU. Typically, operations set the condition flags of the status register 
according to whether the result is zero, negative, etc. This includes register 
load and store operations as well as arithmetic and logical functions. When 
the status register is loaded, however, a bit-for-bit replacement is performed 
with the contents of the source operand, regardless of the state of any bits 
in the source operand. Therefore, following a load, the contents of the status 
register are identically equal to the contents of the source operand. This al- 
lows the status register to be easily saved and restored. See Table 3-2 on 
page 3-6 for definitions of the status register bits. 

The DMA coprocessor interrupt enable register (DIE) is a 32-bit register 
containing 2- and 3-bit fields to designate the interrupt synchronization 
scheme for each of the six DMA channels. It allows each DMA channel to 
service a corresponding input communication port and output communica- 
tion port. Also, each DMA channel can be synchronized with external inter- 
rupts or the on-chip timers. This register is described in subsection 3.1 .8 
on page 3-8. 

The CPU internal interrupt enable register (HE) is also a 32-bit register 
(described in subsection 3.1.9 on page 3-10 ). This register enables/dis- 
ables interrupts for the six communication ports, both timers, and the six 
DMA coprocessor channels. . 

The IIOF flag register (IIF) controls the function (general-purpose I/O or in- 
terrupt) of the four external pins (IIOF0 to IIOF3). Interrupts can be level or 
edge triggered. Subsection 3.1 .1 0 on page 3-1 2 provides further descrip- 
tion. 

The 32-bit repeat counter (RC) register specifies the number of times a 
block of code is to be repeated when performing a block repeat. When the 
processor is operating in the repeat mode, the 32-bit repeat start address 
register (RS) contains the starting address of the block of program memory 
to be repeated, and the 32-bit repeat end address register (RE) contains 
the ending address of the block to be repeated. Further information is in 
subsection 3.1.11 on page 3-14. 

The program counter (PC) is a 32-bit register containing the address of the 
next instruction to be fetched. Although the PC is not part of the CPU register 
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file, it is a register that can be modified by instructions that modify the pro- 
gram flow. 

2.1.5 CPU Expansion Register File 

Besides the CPU primary register file (just covered in subsection 2.1.4, 
starting on page 2-6), the expansion register file contains two special reg- 
isters that act as pointers: 

□ IVTP register (points to the interrupt-vector table, which is shown in 
Figure 3-8 on page 3-16), 

□ TVTP register (points to the trap vector table (TVT), which defines vec- 
tors for 512 interrupts. This is described in Figure 3-7 on page 3-15). 

These two registers are fully described in Section 3.2 on page 3-1 5. 
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The total memory reach of the TMS320C40 is 4G (giga or billion) 32-bit 
words (4 Gbytes). Program memory (on-chip RAM or ROM and external 
memory) as well as registers affecting timers, communication ports, and 
DMA channels are contained within this space. This allows tables, coeffi- 
cients, program code, and data to be stored in either RAM or ROM. Thus, 
memory usage is maximized, and memory space allocated as desired. 

By manipulating one external pin (ROMEN, pin AK4), the first one-mega- 
word area of memory (0000 OOOOh to 000F FFFFh) can be configured to be 
part of the local address bus or configured to address the on-chip ROM 
when using the boot loader (with remaining space reserved). (This is further 
discussed in Section 3.4 on page 3-18.) 



2.2.1 RAM, ROM, and Cache 

Figure 2-3 shows how the memory is organized on the TMS320C40. RAM 
blocks 0 and 1 are 4K bytes (1 K x 32 bits) each. The ROM block is reserved 
and contains a boot loader. Each RAM and ROM block is capable of sup- 
porting two accesses in a single cycle. The separate program buses, data 
buses, and DMA buses allow for parallel program fetches, data reads and 
writes, and DMA operations. For example: the CPU can access two data 
values in one RAM block and perform an external program fetch in parallel 
with the DMA coprocessor loading another RAM block, all within a single 
cycle. 



The reserved ROM block (upper right in Figure 2-3) contains a boot loader. 
This loader supports loading of program and data at reset time. Loading is 
from 8-, 1 6-, or 32-bit wide memories or any one of the six communication 
ports. Section 13.2 (page 13-5) explains the boot loader in detail. 

A 1 28 x 32-bit instruction cache is provided to store often-repeated sections 
of code, thus greatly reducing the number of needed off-chip accesses. This 
allows for code to be stored off-chip in slower, lower-cost memories. The ex- 
ternal buses are also freed for use by the DMA, external memory fetches, 
or other devices in the system. 

For further information about the memory and instruction cache, refer to 
Section 3.4 (memory organization — page 3-18) and Section 3.5 (cache 
memory — page 3-25). 
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Figure 2-3. Memory Organization 
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2.2.2 Memory Maps 

Two memory maps are available as shown in Figure 2-4; the one selected 
depends upon the level at external pin ROMEN. Both maps in the figure il- 
lustrate the 4-gigaword reach of the 'C40; however, they differ in the first 1 
megaword of memory in which: 

□ A one at external pin ROMEN (pin AK4) causes internal ROM to be en- 
abled at OOOOh with the one-megaword space reserved (0000 OOOOh 
- 000F FFFFh). This is shown in the right side of the figure. 

□ A zero at ROMEN causes addresses 0000 OOOOh - 000F FFFFh to be 
accessible on the local bus. This is shown in the left side of the figure. 

The rest of the memory map is the same for either level of ROMEN: 

□ The second megaword of memory is devoted to peripherals (as shown 
in Figure 2-5). 

□ The third megaword of memory contains the two 1 K (4K-byte) blocks 
of RAM (BLK0 and BLK1 as shown at 002F F800h - 002F FFFFh). 

□ The rest of the first 2 gigawords (0030 OOOOh - 7FFF FFFFh) is on the 
local bus (external). 

□ The second 2 gigawords (8000 OOOOh - FFFF FFFFh) are on the global 
bus (external). 

Section 3.4 (page 3-1 8) describes the memory maps in greater detail. Sec- 
tions 7.1 , 7.2, and 7.3, beginning on page 7-3, discuss the local and global 
interfaces to these memories.The peripheral bus map and the vector loca- 
tions for reset, interrupts, and traps are also explained. 
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Figure 2-4. Memory Maps 
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Figure 2-5. Peripheral Memory Map 
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Communication Port 0 (16 words) 

(See paragraph 3.4.2.4, Figure 3-1 4, page 3*23) 


0010 0050h 
0010 005Fh 


Communication Port 1 (16 words) 

(See paragraph 3.4.2,4, figure 3-14, page 3-23) 


0010 0060h 
0010 006Fh 


Communication Port 2 (16 words) 

(See paragraph 3.4.2.4, Figure 3-14, page 3-23) 


0010 0070h 
0010 007Fh 


Communication port 3 (16 words) 

(See paragraph 3,4.2.4, Figure 3-14, page 3-23) 


0010 0080h 
0010 008Fh 


Communication Port 4 (16 words) 

(See paragraph 3A2.4, figure 3-14, page 3-23) 


0010 0090h 
00.10 009Fh 


Communication Port 5 (16 words) 

(See paragraph 3A2.4, Figure 3-14, page 3-23) 


0010 00A0h 
0010 00AFh 


DMA Coprocessor Channel 0 (16 words) 
(See paragraph 3.4.2.5, Figure 3-1 5, page 3-24,) 


001 0 OOBOh 
OOlOOOBFh 


DMA Coprocessor Channel 1 (16 words) 
(See paragraph 3.4.2.5, Figure 3-1 5, page 3-24.) 


OOlOOOCOh 
OOlOOOCFh 


DMA Coprocessor Channel 2 (16 words) 
(See paragraph 3.4.2.5, Figure 3-15, page 3*24.) 


OOlOOODOh 
OOlOOODFh 


DMA Coprocessor Channel 3 (16 words) 
(See paragraph 3.4,2.5, Figure 3-15, page 3-24.) 


OOlOOOEOh 
OOlOOOEFh 


DMA Coprocessor Channel 4 (16 words) 
(See paragraph 3.4.2.5, Figure 3-15, page 3-24.) 


OOlOOOFOh 
OOlOOOFFh 


DMA Coprocessor Channel 5 (16 words) 
(See paragraph 3.4.2.5, Figure 3-15, page 3-24.) 
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2.2.3 Memory Addressing Modes 

The TMS320C40 supports a base set of general-purpose instructions as 
well as arithmetic-intensive instructions that are particularly suited for digital 
signal processing and other numeric-intensive applications. Refer to Chap- 
ter 5 for detailed information on addressing. 

Four groups of addressing modes are provided on the TMS320C40 (major 
headings below). Each group uses two or more of several different address- 
ing types, as shown for each group in the following list: 

1 ) General addressing modes: 

■ Register. The operand is a CPU register. 

■ Immediate. The operand is a 1 6-bit immediate value. 

■ Direct. The operand is the contents of a 32-bit address 
(concatenation of 16 bits of the data page pointer and a 16-bit 
operand). 

■ Indirect. A 32-bit auxiliary register indicates the address of the 
operand. 

2) Three-operand addressing modes: 

■ Register (same as for general addressing mode). 

■ Indirect (same as for general addressing mode). 

■ Immediate (same as for general addressing mode). 

3) Parallel addressing modes: 

■ Register. The operand is an extended-precision register. 

■ Indirect (same as for general addressing mode). 

4) Branch addressing modes: 

■ Register (same as for general addressing mode). 

■ PC-relative. A signed 1 6-bit displacement or a 24-bit displacement 
is added to the PC. 
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2.3 Instruction Set Summary 

Table 2-2 lists the TMS320C40 instruction set in alphabetical order. Each 
table entry shows the instruction mnemonic, description, and operation. Re- 
fer to Chapter 1 1 for a functional listing of the instructions and individual in- 
struction descriptions. 

Table 2-2. Instruction Set Summary 



Mnemonic 


Description 


Operation 


ABSF 


Absolute value of a floating-point number 


\src\ -> Rn 


ABSI 


Absolute value of an integer 


|src| — > Dreg 


ADDC 


Add integers with carry 


src + Dreg + C — » Dreg 


ADDC3 


Add integers with carry (3-operand) 


srrt + src2 + C — » Dreg 


ADDF 


Add floating-point values 


src+ Rn — > Rn 


ADDF3 


Add floating-point values (3-operand) 


srd + src2 — > Rn 


ADD I 


Add integers 


src+ Dreg — » Dreg 


ADDI3 


Add integers (3-operand) 


srd + src2 + -» Dreg 


AND 


Bitwise logical- AND 


Dreg AND src — > Dreg 


AND3 


Bitwise logical-AND (3-operand) 


src\ AND src2 -> Dreg 


ANDN 


Bitwise logical-AND with complement 


Dreg AND src — > Dreg 


ANDN3 


Bitwise logical-ANDN (3-operand) 


srd AND src2 -» Dreg 


ASH 


Arithmetic shift 


If count > 0: 

(Shifted Dreg left by count) -» Dreg 
Else: 

(Shifted Dreg right by |count|) -» Dreg 


ASH3 


Arithmetic shift (3-operand) 


If count > 0: 

(Shifted src left by count) -» Dreg 
Else: 

(Shifted src right by |count|) — > Dreg 



LEGEND: 

src general addressing modes Dreg 

srd three-operand addressing modes Rn 

src2 three-operand addressing modes Daddr 

Csrc conditional-branch addressing modes ARn 

Sreg register address (any register) cond 

count shift value (general addressing modes) ST 

SP stack pointer RE 

GIE global interrupt enable register RS 

RM repeat mode bit PC 

TOS top of stack C 



register address (any register) 
register address (R0 — R11) 
destination memory address 
auxiliary register n (ARO — AR7) 
condition code (see Table 11-8) 
status register 
repeat interrupt register 
repeat start register 
program counter 
carry bit 
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Table 2-2 


Instruction Set Summary (Continued) 




Mnemonic 


Description 


Operation 


Bcond 


Branch conditionally (standard) 


If cond= true: 
If Csrc is a register, Csrc — > PC 
If Csrc is a value, Csrc + PC + 1 — » PC 

Else: PC + 1 — » PC 


BcondAF 


Branch conditionally delayed and annul if 
false 


If cond is true: 
If src is a register: 

src-» PC 
If src is a displacement: 
src + PC of branch + 3 -» PC 
Else: If condls false, annul execute phase re- 
sults of next 3 instructions and continue 


BcondAT 


Branch conditionally delayed and annul if 
true 


If cond is true: 
If src is a register: 
src-> PC 

annul execute phase results of next 3 
instructions 
If src is a displacement: 
src + PC of branch + 3 -> PC 
annul execute phase results of next 3 
instructions 

II IOII UVsllVJI IO 

Else: continue 


BcondD 


Branch conditionally (delayed) 


If cond = true: 
If Csrc is a register, Csrc — ^ PC 
If Csrc is a value, Csrc + PC + 3 — » PC 

Else: PC + 1 — > PC 


BR 


Branch unconditionally (standard) 


Csrc + PC + 1 — » PC 


BRD 


Branch unconditionally (delayed) 


Csrc + PC + 3 ■— » PC 


CALL 


Call subroutine 


PC + 1 -> TOS 
Csrc+ PC + 1 — > PC 


CALLcond 


Call subroutine conditionally 


If cond = true: 
PC + 1 — » TOS 

If Csrc is a register, Csrc —> PC 
If Csrc is a value, Csrc + PC — > PC 
Else: PC + 1 PC 


CMPF 


Compare floating-point values 


Set flags on Rn - src 


CMPF3 


Compare floating-point values 
(3-operand) 


Set flags on srd - src2 


CMPI 


Compare integers 


Set flags on Dreg - src 


CMPI3 


Compare integers (3-operand) 


Set flags on srd - src2 


DBcond 


Decrement and branch conditionally 
(standard) 


ARn - 1 -> ARn 

If cond= true and ARn > 0: 
If Csrc is a register, Csrc ~> PC 
If Csrc is a value, Csrc + PC + 1 — > PC 

Else: PC + 1 PC 
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Table 2-2. Instruction Set Summary (Continued) 



Mnemonic 


Description 


Operation 


DBcondD 


Decrement and branch conditionally 
(delayed) 


ARn - 1 -> ARn 
If cond « true and ARn > 0: 
If Csrc is a register, Csrc — » PC 

If orr> ie a wall 1a err* _i_ Df* _i_ Q v DP 
IT \jSiC IS a Value, \~*SrC + r \j + O 7 f^O 

Else: PC + 1 — > PC 


FIX 


Convert floating-point value to integer 


Fix (sre) — > Dreg 


FLOAT 


Convert integer to floating-point value 


Float(src) -» Rn 


FRIEEE 


Convert from IEEE format 


Convert sre from IEEE format -» Dreg 


IACK 


Interrupt acknowledge 


Perform a dummy read with IACK = 0 
At end of dummy read, set IACK = 0 


IDLE 


Idle until interrupt 


PC + 1 — > PC, then Idle until next interrupt 


LATcond 


Link and trap conditionally 


If rnnci tri ip* 

II OL// l\J IO IIUC7> 

ST(GIE) -> ST(PGIE) 
ST(CF) -> ST(PCF) 

0 -> ST(GIE) 

1 ST(CF) 

PC of LAcond+4 -> R11 
trap vector N -» PC 
Else: continue 


LAJ 


Link and jump 


PC + 4->R11 

PC of LAJ + 3 + sre -> PC 


laj cono 


Link and jump conditional 


If cond is true and sre is a gegister: 
PC of LAJcond* 4 -» R11 & src -> PC 

If concf is true and sre is a displacement:: 
PC of LAJcond + 4 -> R11 , & sre + PC of 
LAJcond+3 + -»PC 

Else, continue 


LBb 


Load byte 


Sgn extended byte (byte 3,2,1 ,0) of src-> Dreg 


LBUb 


Load byte unsigned 


Unsigned byte (byte 3,2,1 ,0) of src-» Dreg 


LDA 


Load address register 


sre -» Dreg 


LDE 


Load floating-point exponent 


src(exponent) — > Rn(exponent) 


LDEP 


Load integer from exppansion register file 
to primary register file 


sre -> Dreg 



LEGEND: 








sre 


general addressing modes 


Dreg 


register address (any register) 


srd 


three-operand addressing modes 


Rn 


register address (R0 — R11 ) 


src2 


three-operand addressing modes 


Daddr 


destination memory address 


Csrc 


conditional-branch addressing modes 


ARn 


auxiliary register n (ARO — AR7) 


Sreg 


register address (any register) 


cond 


condition code (see Table 11-8) 


count 


shift value (gener^f]addressing modes) 


ST 


status register 


SP 


stack pointer 


RE 


repeat interrupt register 


GIE 


global interrupt enable register 


RS 


repeat start register 


RM 


repeat mode bit 


PC 


program counter 


TOS 


top of stack 
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Table 2-2. 


Instruction Set Summary (Continued) 




Mnemonic 


Description 


Operation 


LDF 


Load floating-point value 


src — » Rn 


LDFcond 


Load floating-point value conditionally 


If cond = true, src — » Rn 
Else: Rn is not changed 


LDFI 


Load floating-point value, interlocked 


Signal interlocked operation src — » Rn 


LDHI 


Load 16 MSBs with 16-bit immediate 


src -> 1 6 MSBs of Dreg 


LDI 


Load integer 


src — » Dreg 


LDlcond 


Load integer conditionally 


If cond= true, src — > Dreg 
Else: Dreg is not changed 


LDII 


Load integer, interlocked 


Signal interlocked operation src — » Dreg 


LDM 


Load floating-point mantissa 


src (mantissa) — > Rn (mantissa) 


LDP 


Load data page pointer 


src-> data page pointer 


LUrC 


Load integer from primary register file to 
expansion register file 


bic — 7 ureg 


LDPK 


Load data page pointer immediate 


src — » DP 


LHw 


Load half word 


Sign-extended half word of src -» Dreg 


LHUw 


Load half word unsigned 


Unsigned half word of src-> Dreg 


LSH 


Logical shift 


If count > 0: 

(Dreg left-shifted by count) — > Dreg 
Else: 

(Dreg right-shifted by |count|) — » Dreg 


LSH3 


Logical shift (3-operand) 


If count > 0: 

(src left-shifted by count) -» Dreg 
Else: 

(src right-shifted by |count|) —» Dreg 


LWLrt 


I c\ar\ \A/orH loft chiftorl 

L.U&U WUIU, IdlOlllllCsU 


src« (0,1 ,2,3) bytes and merged with Dreg -» 
Dreg 


LWRct 


I oaH wnrH rinht chift/aH 
L_wavj wviu, i lyi ii oi in icu 


src» (0,1 ,2,3) bytes and merged with Dreg -» 
Dreg 


MBct 


Merge byte, left shifted 


o Lobs oT src « (u,i,^,o) oyies ana mergea 
with Dreg -> Dreg 


MHct 


Merge half word, left shifted 


1 6 LSBs of src« (0,1 ) half words and merged 
with Dreg — > Dreg 


MPYF 


Multiply floating-point values 


src X Rn — » Rn 


MPYF3 


Multiply floating-point value (3-operand) 


srd X src2 — > Rn 


MPYI 


Multiply integers 


srcx Dreg -» Dreg 


MPYI3 


Multiply integers (3-operand) 


srd x src2 — > Dreg 
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Table 2-2. Instruction Set Summary (Continued) 



Mnemonic 


Description 


Operation 


MPYSHI 


Multiply signed integer and produce 32 
MSBs 


dstx src—> Dreg 


MPYSHI3 


Multiply signed integer and produce 32 
mods, o operana 


srd x src2 —> Dreg 


MPYUHI 


Multiolv unsioned inteaer and oroduce 32 
MSBs 


Dreg x src — > Dreg 


MPYUHI3 


Multiply unsigned integer and produce 32 
MSBs, 3 operand 


srd X src2— » Dreg 


NEGB 


Negate integer with borrow 


0 — src — C — > Dreg 


NEGF 


Negate floating-point value 


0 - src — > Rn 


NEGI 


Negate integer 


0 - src — > Dreg 


NOP 


No operation 


Modify ARn if specified 


NORM 


Normalize floating-point value 


Normalize (src) — » Rn 


NOT 


Bitwise logical-complement 


src — > Dreg 


OR 


Bitwise logical-OR 


Dreg OR src — > Dreg 


OR3 


Bitwise logical-OR (3-operand) 


srcl OR srdl — > Dreg 


POP 


Pop integer from stack 


*SP > Dreg 


POPF 


Pop floating-point value from stack 


*SP > Rn 


PUSH 


Push integer on stack 


Sreg -> *++ SP 


PUSHF 


Push floating-point value on stack 


Rn-> *++SP 


RCPF 


Reciprocal floating point 


1 6-bit reciprocal of src — > dst 


RETScond 


Return from subroutine conditionally 


If cond = true or missing: 

*SP > PC 

Else: continue 


RND 


Round floating-point value 


Round (src) — > Rn 



LEGEND: 








src 


general addressing modes 


Dreg 


register address (any register) 


srd 


three-operand addressing modes 


Rn 


register address (RO — R11 ) 


src2 


three-operand addressing modes 


Daddr 


destination memory address 


Csrc 


conditional-branch addressing modes 


ARn 


auxiliary register n (ARO — AR7) 


Sreg 


register address (any register) 


cond 


condition code (see Table 11-8) 


count 


shift value (general addressing modes) 


ST 


status register 


SP 


stack pointer 


RE 


repeat interrupt register 


GIE 


global interrupt enable register 


RS 


repeat start register 


RM 


repeat mode bit 


PC 


program counter 


TOS 


top of stack 


C 


carry bit 
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Table 2-2. Instruction Set Summary (Continued) 



Mnemonic 


Description 


Operation 


ROL 


Rotate left 


Dreg rotated left 1 bit — > Dreg 


ROLC 


Rotate left through carry 


Dreg rotated left 1 bit through carry -» Dreg 


RUR 


Rotate right 


Dreg rotated right 1 bit — > Dreg 


RORC 


Rotate right through carry 


Dreg rotated right 1 bit through carry — » 
Dreg 


RPTB 


Repeat block of instructions 


src-> RE 
1 ST (RM) 
Next PC — > RS 


RPTBD 


Repeat block delayed 


If src is an immediate value (displacement) 

src + PC +3-» RE 
Else: 

src-» RE 

1 ST (RM) 

PC of RPTBD + 4 -> RS 


RPTS 


Repeat single instruction 


src -> RC 

•4 v OT / 1" » ft i\ 

1 — > ST (RM) 
Next PC -» RS 
Next PC RE 


RSQRF 


Reciprocal of square root floating point 


1 6-bit reciprocal of square root of src -» Dreg 


SIGI 


Signal, interlocked 


Signal interlocked operation 
Wait for interlock acknowledge 
Clear interlock 


STF 


Store floating-point value 


Rn ~> Daddr 


STFI 


Store floating-point value, interlocked 


Rn -> Daddr 

Qinnal onH of intorlrvkoH nnoratinn 

Ol^l Idl Ol IVJ \Jl II IIC7I lUOfxCU WjJCI C1MUI 1 


STI 


Store integer 


Sreo — ^ Daddr 


STII 


Store integer, interlocked 


Sreg -> Daddr 

Signal end of interlocked operation 


STIK. 


Store integer immediate value 


src — > Dreg 


SUBB 


Subtract integers with borrow 


Dreg - src - C — » Dreg 


SUBB3 


Subtract integers with borrow (3-operand) 


srd - src2 - C -» Dreg 


SUBC 


Subtract integers conditionally 


If Dreg -src >0: 

[(Dreg - src) « t] OR 1 — > Dreg 

Else: Dreg « 1 -» Dreg 
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Table 2-2. 


Instruction Set Summary (Concluded) 




Mnemonic 


Description 


Operation 


ouDr 


ouuuaui iiuaiing-puini values 


Rn — src— > Rn 


OUDrO 


Ci il-\+r , Q/~»+ f I/^Q+J n+ \/q|i iqo f O Ai^Ai , onrl\ 

OUUllctCl IIUcUH iy-puilll VdlUcd ^o—upcrcuiuy 


srd — src2 — > Rn 


91 IRI 


9i iK*tra/^t intartfirc 

ouuiictui li iieyurb 


Dreg — src — > Dreg 


OUDIO 


Cl lMl'')At AKO / O ArtAI'Ar»A'\ 

ouDiraci imegers ^o-operanu; 


srd - srdl — > Dreg 


SUBRB 


Subtract reverse integer with borrow 


src- Dreg - C — > Dreg 


SUBRF 


Subtract reverse floating-point value 


src- Rn — > Rn 


SUBRI 


Subtract reverse integer 


src - Dreg — » Dreg 


SWI 


Software interrupt 


Perform emulator interrupt sequence 


TOIEEE 


Convert to IEEE format 


Convert src to IEEE format -» dsf 


TRAP cona 


Trap conditionally 


If cond= true or missing: 

Next PC — » * ++ SP 

Trap vector N —» PC 

0 — > ST (GIE) 
Else: continue 


TSTB 


Test bit fields 


Dreg AND src 


TSTB3 


Test bit fields (3-operand) 


src\ AND src2 


XOR 


Bitwise exclusive-OR 


Dreg XOR src — > Dreg 


XOR3 


Bitwise exclusive-OR (3-operand) 


srd XOR src2 -> Dreg 



LEGEND: 








src 


general addressing modes 


Dreg 


register address (any register) 


srd 


three-operand addressing modes 


Rn 


register address (RO — R11 ) 


src2 


three-operand addressing modes 


Daddr 


destination memory address 


Csrc 


conditional-branch addressing modes 


ARn 


auxiliary register n (ARO — AR7) 


Sreg 


register address (any register) 


addr 


24-bit immediate address (label) 


count 


shift value (general addressing modes) 


cond 


condition code (see Table 11-8) 


SP 


stack pointer 


ST 


status register 


GIE 


global interrupt enable register 


RE 


repeat interrupt register 


RM 


repeat mode bit 


RS 


repeat start register 


TOS 


top of stack 


PC 


program counter 






C 


carry bit 
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TMS320C40 Instruction Set — Parallel Instructions 



Table 2-3. Parallel Instruction Set Summary 



Mnemonic 


Description 


Operation 


Parallel Arithmetic With Store Instructions 


ABSF 
II STF 


Absolute value of a floating-point 


\src2[ — > dst1 
\\src3-^> dst2 


ABSI 
IISTI 


Absolute value of an integer 


Isrc2\ -> dst1 
|| src3-> dst2 


ADDF3 
||STF 


Add floating-point 


srd + src2— » dst1 
|| src3-> dst2 


ADDI3 
IISTI 


Add integer 


srd + src2 — > dst1 
II sm3 — * dst2 


AND3 
IISTI 


Bitwise logical- AND 


srd AND src2 -> dst1 
II src3 — ^ dst2 


ASH3 


Arithmetic shift 


If count > 0: 

src2 « count — > dst1 
|| src3-> dst2 
Else: 

src2» |count| — > dst1 
|| src3-» dst2 


FIX 
IISTI 


Convert floating-point to integer 


Frx($rc2) — > dst 1 
\\src3->dst2 


FLOAT 
|| STF 


Convert integer to floating-point 


F\odX(src2) — > dst1 
\\src3->dst2 


FRIEEE 
II STF 


Parallel FRIEEE and STF 


Convert src2 from IEEE format — » dsf7 
in parallel with src3 — » dsf2 


LDF 
|| STF 


Load floating-point 


src2-± dst1 
|| src3—> dst2 


LDI 
IISTI 


Load integer 


src2 —> dst1 
|| src3->dst2 


LSH3 


Logical shift 


If count > 0: 

src2 « count — » dsf / 
|| src3->dst2 
Else: 

src2 » |count| — > dst1 
\\src3-*dst2 



LEGEND (for parallel instructions): 

srd register addr (R0 — R1 1 ) 
src3 register addr (R0 — R1 1 ) 
dst1 register addr (R0 — R11) 
op3 - registeraddr (R0 or R1 ) 



src2 indirect addr (disp = 0, 1 , IRO, IR1 ) 

src4 indirect addr (disp = 0, 1 , IRO, IR1 ) 

dst2 indirect addr (disp = 0, 1 , IRO, IR1 ) 

op6 register addr (R2 or R3) 



op1 ,op2,op4,op5 - Two of these operands must be specified using register addr, and two must be specified 
using indirect. 
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Mnemonic 


Description 


Operation 


MPYF3 
II STF 


Multiply floating-point and store 


srd x src2 — » dst1 
|| src3-*dst2 


MPYI3 
|| ST! 


Multiply integer 


srd x src2 — » dstl 
|| src3->dst2 


NEGF 
|| STF 


Negate floating-point 


0-src2-> dstl 
\\src3- : >dst2 


TOIEE 
II STF 


Convert to IEEE floating point format 


convert src2 to IEEE format — > dstl 
|| src3->dst2 


Parallel Arithmetic With Store Instructions (Concluded) 


NEGI 
|| STI 


Negate integer 


0-src2— > dstl 
|| src3 dst2 


NOT 
|| STI 


Complement 


srd — » dstl 
|| src3->dst2 


OR3 
II STI 


Bitwise logical-OR 


srd OR src2 — » dst 1 
\\src3->dst2 


STF 
II STF 


Store floating-point 


srd — » dstl 
\\src3-*dst2 


STI 
II STI 


Store integer 


srd — > dstl 
\\src3-^dst2 


SUBF3 
II STF 


Subtract floating-point 


srd -src2-^dst1 
|| src3-*dst2 


SUBI3 
II STI 


Subtract integer 


srd -src2->dst1 
\\src3->dst2 


XOR3 
II STI 


Bitwise exclusive-OR 


srd XOR src2 -» dstl 
|| src3->dst2 



LEGEND (for paraiiei instructions): 

srd register addr (RO — R1 1 ) src2 

src3 register addr (RO — R11) src4 

dstl register addr (RO — R11) dst2 

op3 register addr (RO or R1 ) op6 

op1 ,op2,op4,op5 - Two of these operands must be specified using register addr, and two must be specified 
using indirect. 



indirectaddr (disp = 0, 1 , IRO, IR1 ) 
indirect addr (disp = 0, 1 , IRO, IR1) 
indirect addr (disp = 0, 1 , IRO, IR1 ) 
register addr (R2 or R3) 
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TMS320C40 Instruction Set — Parallel Instructions 



Table 2-3. Parallel Instruction Set Summary (Concluded) 



Mnemonic 


Description 


Operation 


Parallel Load Instructions 


LDF 
|| LDF 


Load floating-point 


src2 — > dst 1 
|| src4^>dst2 


LDF 
||STF 


Load floating point and store floating 
point 


src2 — » dst 1 
|| src3-^dst2 


LDI 
||LDI 


Load integer 


src2 -» dst 1 
|| src4->dst2 


LSH3 
IISTI 


Logical shift, 3 operand, and store integer 


If counts 0: 

src2« counts dst1 
Else: 

src2» \counti -» dst1 
|| src3->dst2 


LSH3 
II STI 


Logical shift 3 and store integer 


src2 —> dst1 
\\src3->dst2 


Parallel Multiply And Add/Subtract Instructions 


MPYF3 
|| ADDF3 


Multiply and add floating-point 


op1 x op2 — > op3 
||op4 + op5— »op6 


MPYF3 
|| SUBF3 


Multiply and subtract floating-point 


op1 x op2 — > op3 
||op4-op5— >op6 


MPYI3 
|| ADD 13 


Multiply and add integer 


op1 x op2 — > op3 
||op4 + op5—>op6 


MPYI3 
|| SUBI3 


Multiply and subtract integer 


op1 x op2 — > op3 
|| op4-op5 — > op6 



LEGEND (for parallel instructions): 
srd register addr (R0 — R1 1 ) 
src3 register addr (R0 — R1 1 ) 
dst1 register addr (R0 — R1 1 ) 
op3 - registeraddr (R0 or R1 ) 

opl ,op2,op4,op5 - Two of these operands must be 
using indirect. 



src2 indirectaddr (disp = 0, 1 , IRO, IR1) 

src4 indirect addr (disp = 0, 1 , IRO, IR1 ) 

dst2 indirect addr (disp = 0, 1 , IRO, IR1 ) 

op6 register addr (R2 or R3) 

specified using register addr, and two must be specified 
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2.4 Internal Bus Operation 

A large portion of the TMS320C40's high performance is due to internal bus- 
ing and parallelism. Separate buses allow for parallel program fetches, data 
accesses, and DMA accesses: 

□ program buses PADDR and PDATA 

□ data buses DADDR1 , DADDR2, and DDATA 

□ DMA buses DMAADDR and DMADATA 

These buses connect all of the physical spaces (on-chip memory, off-chip 
memory, and on-chip peripherals) supported by the TMS320C40. 
Figure 2-3 shows these internal buses and their connection to on-chip and 
off-chip memory blocks. 

The program counter (PC) is connected to the 32-bit program address bus 
(PADDR). The instruction register (IR) is connected to the 32-bit program 
data bus (PDATA). These buses can fetch a single instruction word every 
machine cycle. 

The 32-bit data address buses (DADDR1 and DADDR2) and the 32-bit data 
data bus (DD ATA) support two data memory accesses every machine cycle. 
The DDATA bus carries data to the CPU over the CPU1 and CPU2 buses. 
The CPU1 and CPU2 buses can carry two data memory operands to the 
multiplier, ALU, and register file every machine cycle. Also internal to the 
CPU are register buses REG1 and REG2, which can carry two data values 
from the register file to the multiplier and ALU every machine cycle. 
Figure 2-2 shows the buses internal to the CPU section of the processor. 

The DMA controller is supported with a 32-bit address bus (DMAADDR) and 
a 32-bit data bus (DMADATA). These buses allow the DMA to perform 
memory accesses in parallel with the memory accesses occurring from the 
data and program buses. 
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External Bus Operation 



2.5 External Bus Operation 

The TMS320C40 provides two identical external interfaces: the global 
memory interface and the local memory interface. Each consists of a 32-bit 
data bus, a 31 -bit address bus, and two sets of control signals. Both buses 
can be used to address e xterna l program/data memory or I/O space. The 
buses also have external RDY signals for wait-state generation with wait 
states inserted under software control. Chapter 7 covers external bus oper- 
ation. 



2.5.1 Interrupts 

The TMS320C40 supports four external interrupts (IIOF3-0), a number of 
internal interrupts, a nonmaskable, external NMI interrupt, and a nonmask- 
able external RESET signal, which sets the processor to a known state. The 
DMA and communication ports ha ve thei r own internal interrupts. When the 
CPU responds to the interrupt, the IACK pin can be used to signal an exter- 
nal interrupt acknowledge. Section 6.7 (beginning on page 6-23) covers 
RESET and interrupt processing. 



2.5.2 Interlocked Instructions 

In order for multiple processors to access global memory and share data in 
a coherent manner, arbitration is necessary. This arbitration ( handshaking) 
is the purpose of the TMS320C40's interlocked operations, handled 
through the Interlocked instructions (explained in Section 6.4 on page 6-11). 
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2.6 Peripherals 

All TMS320C40 peripherals are controlled through memory-mapped regis- 
ters on a dedicated peripheral bus. This peripheral bus is composed of a 
32-bit data bus and a 32-bit address bus. This peripheral bus permits 
straightforward communication to the peripherals. The TMS320C40 periph- 
erals include two timers and two serial ports. Figure 2-6 shows the periph- 
erals with associated buses and signals. 

Figure 2-6. Peripheral Modules 
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\mm ^ TMS320C40 Peripherals 

2.6.1 Communication Ports 

Six high-speed communication ports provide rapid processor-to-processor 
communication through each port's dedicated communication interfaces. 
Coupled with the 'C40's two memory interfaces (global and local), this al- 
lows you to construct a parallel processor system that attains optimum sys- 
tem performance by the distributing of tasks among several processors. 
Each 'C40 can pass the results of its work to another, enabling each '040 
to continue working. Chapter 8 explains communication port operation in 
detail. 

Communication port features: 

□ 160-megabit per second (20-Mbytes or 5-Mwords per second) 
bidirectional data transfer operations (at 40-ns cycle time) 

□ direct (glueless) processor-to-processor communication via eight data 
lines and four control lines 

□ buffering of all data transfers, both input and output 

□ automatic arbitration provided to ensure communication synchroniza- 
tion 

□ synchronization between the CPU or direct-memory access (DMA) 
coprocessor and the six communication ports via internal interrupts and 
internal ready signals. 

2.6.2 Direct Memory Access (DMA) 

The six channels of the on-chip Direct Memory Access (DMA) coprocessor 
can read from or write to any location in the memory map without interfering 
with the operation of the CPU. This allows interfacing to slow external me- 
mories and peripherals without reducing throughput to the CPU. The DMA 
coprocessor contains its own address generators, source and destination 
registers, and transfer counter. Dedicated DMA address and data buses al- 
low for minimization of conflicts between the CPU and the DMA coproces- 
sor. A DMA operation consists of a block or single-word transfer to or from 
memory. A key feature of the DMA coprocessor is its ability to automatically 
reinitialize each channel following a data transfer. Refer to Chapter 9 for de- 
tailed information on the DMA coprocessor. 

2.6.3 Timers 

The two timer modules are general-purpose 32-bit timer/event counters 
with two signaling modes and internal or external clocking. They can signal 
internally to the 'C40 or externally to the outside world at specified intervals, 
or they can count external events. Each timer has an I/O pin that can be used 
as an input clock to the timer, as an output signal driven by the timer, or as 
a general-purpose I/O pin. Timers are described in detail in Section 9.1 0 on 
page 9-45. 
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Chapter 3 



CPU Registers, Memory, and Cache 



The CPU primary register file contains 32 registers that can be used as 
operands by the multiplier and ALU (arithmetic logic unit). The register file 
includes the auxiliary registers, extended-precision registers, and index 
registers. These registers support addressing, floating-point/integer opera- 
tions, stack management, processor status, block repeats, branching, and 
interrupts. 

The CPU expansion register file contains two registers — the interrupt 
vector table pointer (IVTP) and the trap vector table pointer (TVTP). 

The TMS320C40 accesses a total memory space of 4G (giga = 1 billion) 
32-bit words (16 gigabytes) of program, data, and I/O space. Two internal 
RAM blocks of 1 K x 32 bits each (4K bytes) and an internal ROM block con- 
taining a boot loader permit two accesses per block in a single cycle. 

A 1 28 x 32-bit instruction cache stores often-repeated sections of code. The 
cache greatly reduces the number of off-chip accesses, allowing code to be 
stored off-chip in slower, lower-cost memories without degrading perform- 
ance. The cache also speeds data fetches to the same physical space as 
the program by not burdening the bus with program instruction fetches. 
Three bits in the CPU status register control the clear, enable, or freeze of 
the cache. 

This chapter describes in detail each of the CPU registers, the memory 
maps, and the instruction cache. Major topics are as follows: 



Section Page 

3.1 CPU Primary Register File 3-3 

■ Extended-Precision Registers (R0-R11) 3-4 

■ Auxiliary Registers (AR0-AR7) 3-5 

■ Data-Page Pointer (DP) 3-5 

■ Index Registers (IRO, IR1) 3-5 

■ Block-Size Register (BK) 3-5 

■ System Stack Pointer (SP) 3-5 
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■ Status Register (ST) 3-5 

■ DMA Interrupt Enable Register (DIE) 3-8 

■ Internal Interrupt Enable Register (HE) 3-10 

■ Interrupt Flag Register (IIF) Controls External Pins 
IIOF(3~0),Timer/DMA Flags .3-12 

■ Block-Repeat (RS, RE) and 
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CPU Primary Register File 



3.1 CPU Primary Register File 

The TMS320C40 provides 32 registers in a multiport register file that is tight- 
ly coupled to the CPU. The PC (program counter) is not included in the 
32 registers. The registers' names and assigned function are listed in 
Table 3-1. 

All of these registers can be used as operands by the multiplier and ALU, 
and can be used as general-purpose 32-bit registers. However, the regis- 
ters also have some special functions for which they are particularly appro- 
priate. For example, the 12 extended-precision registers are especially 

Table 3-1. CPU Primary Register File 





Register 








Assembler 


Machine 




See 


On 


Syntax 


Vaiue (hex) 


Assigned Function Name 


Paragraph 


Page 


RO 


00 


Extended-precision register 0 


3.1.1 


3-4 


R1 


01 


Extended-precision register 1 


3.1.1 


3-4 


R2 


02 


Extended-precision register 2 


3.1.1 


3-4 


R3 


03 


Extended-precision register 3 


3.1.1 


3-4 


R4 


04 


Extended-precision register 4 


3.1.1 


3-4 


R5 


05 


Extended-precision register 5 


3.1.1 


3-4 


R6 


06 


Extended-precision register 6 


3.1.1 


3-4 


R7 


07 


Extended-precision register 7 


3.1.1 


3-4 


R8 


1C 


Extended-precision register 8 


3.1 .1 


3-4 


R9 


1D 


Extpnded-Drppteion renister 9 


3.1.1 


3-4 


R10 


1E 


Extpndpd-nrprtaion reni^ter 10 


3.1.1 


3-4 


R11 


1F 


Extended-precision register 11 


3.1.1 


3-4 


ARO 


08 


Auxiliary register 0 


3.1.2 


3-5 


AR1 


09 


Auxiliary register 1 


3.1.2 


3-5 


AR2 


OA 


Auxiliary register 2 


3.1.2 


3-5 


AR3 


0B 


Auxiliary register 3 


3.1.2 


3-5 


AR4 


OC 


Auxiliary register 4 


3.1.2 


3-5 


AR5 


0D 


Auxiliary register 5 


3.1.2 


3-5 


AR6 


0E 


Auxiliary register 6 


3.1.2 


3-5 


AR7 


OF 


Auxiliary register 7 


3.1.2 


3-5 


DP 


10 


Data-page pointer 


3.1.3 


3-5 


IRO 


11 


Index register 0 


3.1.4 


3-5 


IR1 


12 


Index register 1 


3.1.4 


3-5 


BK 


13 


Block-size register 


3.1.5 


3-5 


SP 


14 


System stack pointer 


3.1.6 


3-5 


ST 


15 


Status register 


3.1.7 


3-5 


DIE 


16 


DMA coprocessor interrupt enable 


3.1.8 


3-8 


HE 


17 


Internal-interrupt enable register 


3.1.9 


3-10 


IIF 


18 


IIOF flag register (IIOF3-0, timers, DMA) 


3.1.10 


3-12 


RS 


19 


Repeat start address 


3.1.11 


3-14 


RE 
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Repeat end address 


3.1.11 


3-14 


RC 
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Repeat counter 


3.1.11 


3-14 



3-3 



CPU Register File — Registers R0-R11 



well suited for maintaining extended-precision floating-point results. The 
eight auxiliary registers support a variety of indirect addressing modes and 
can be used as general-purpose 32-bit integer and logical registers. The re- 
maining registers provide system functions such as addressing, stack man- 
agement, processor status, interrupts, and block repeat. Refer to Chapter 
5 for detailed information and examples of the use of CPU registers in ad- 
dressing. 



3.1 .1 Extended-Precision Registers (R0-R11) 



Figure 3-1. 



The 1 2 extended-precision registers (R0-R1 1 ) can store and support oper- 
ations on 32-bit integer and 40-bit floating-point numbers. These registers 
consist of two separate and distinct regions: 

□ bits 39-32: dedicated to storage of the exponent (e) of the floating-point 
number. 

□ bits 31-0: store the mantissa of the floating-point number: 

■ bit 31: sign bit (s), 

■ bits 30-0: the fraction (f). 

Any instruction that assumes the operands are floating-point numbers uses 
bits 39-0. Figure 3-1 illustrates the storage of 40-bit floating-point numbers 
in the extended-precision registers. 

Extended-Precision Register Floating-Point Format 

39 32 31 30 0 



fraction (f) 



mantissa 



For integer operations, bits 31 -0 of the extended-precision registers contain 
the integer (signed or unsigned). Any instruction that assumes the operands 
are either signed or unsigned integers uses only bits 31-0. Bits 39-32 re- 
main unchanged. This is true for all shift operations. The storage of 32-bit 
integers in the extended-precision registers is shown in Figure 3-2. 



Figure 3-2. Extended-Precision Register Integer Format 

39 32 31 



unchanged 



signed or unsigned integer 



3-4 



CPU Registers, Memory, and Cache 



CPU Register File — Registers ARx, DP, IRx, BK, SP, ST 

3.1 .2 Auxiliary Registers (AR0-AR7) 

The eight 32-bit auxiliary registers (AR0-AR7) can be accessed by the CPU 
and modified by the two auxiliary register arithmetic units (ARAUs). The pri- 
mary function of the auxiliary registers is the generation of 32-bit addresses. 
However, they can also operate as loop counters in indirect addressing or 
as 32-bit general-purpose registers that can be modified by the multiplier 
and ALU. Refer to Chapter 5 for detailed information and examples of the I 
use of auxiliary registers in addressing. 

3.1 .3 Data-Page Pointer (DP) 

The data-page pointer (DP) is a 32-bit register whose 16 LSBs are used 
by the direct addressing mode as a pointer to the page of data being ad- 
dressed. Data pages are 64K words long with a total of 64K (65,536) pages. 
Bits 31-16 are reserved; they are always read as zeroes and should not 
be modified by writing to the register. The DP can be loaded by using 
the LDP pseudo-instruction or the LDI instruction. Figure 5-1 on page 5-4 
describes this register's function. 

3.1 .4 Index Registers (IRO, IR1 ) 

The 32-bit index registers (IRO and IR1) are used by the auxiliary register 
arithmetic unit (ARAU) for indexing the address. IRO is also used for bit-rev- 
ersed addressing. Refer to Chapter 5 for detailed information and examples 
of the use of index registers in addressing. (Subsection 5.1 .3 on page 5-5 
covers use of the IR in indirect addressing; see the examples starting on 
page 5-1 2. Section 5.4 on page 5-30 describes using it with bit-reversed ad- 
dressing). 

3.1 .5 Block-Size Register (BK) 

The 32-bit block-size register (BK) is used by the ARAU in circular address- 
ing to specify the data block size (see Section 5.3 on page 5-25). 

3.1 .6 System Stack Pointer (SP) 

The system stack pointer (SP) is a 32-bit register that contains the address 
of the top of the system stack. The SP always points to the last element 
pushed onto the stack. The SP is manipulated by interrupts, traps, calls, re- 
turns, and the PUSH, PUSHF, POP, and POPF instructions. Pushes and 
pops of the stack perform preincrement and postdecrement, respectively, 
on all 32 bits of the SP. Refer to Section 5.5 on page 5-31 for information 
about system stack management. 

3.1 .7 Status Register (ST) 

The status register (ST) contains global information relating to the CPU 
state. Typically, operations set the condition flags of the status register ac- 
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cording to whether the result is zero, negative, etc. This includes register 
load and store operations as well as arithmetic and logical functions. How- 
ever, when the ST is loaded, the contents of the load instruction's source 
operand replace the ST current contents bit for bit, regardless of the state 
of any bit(s) in the source operand. Therefore, following an ST load, the con- 
tents of the ST are identical to the contents of the source operand. This al- 
lows the status register to be saved easily and restored. At system reset, 
0 is written to this register. 

The format of the status register is shown in Figure 3-3. Table 3-2 defines 
the status register bits, their names, and functions. 

Figure 3-3. Status Register 
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SC 


PGIE 
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CC 


CE 


CF 


PCF 


RM 


OVM 
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LV 


UF 


N 


Z 


V 


C 
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R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 
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NOTE: xx = reserved bit. 

R = read, W = write. 



Table 3-2. Status Register Bits Summary 



Bit 


Bit Field 
Name 


Function 


Ot 


C 


Carry condition flag 


it 


V 


Overflow condition flag 


2t 


z 


Zero condition flag 


3t 


N 


Negative condition flag 


4t 


UF 


Floating-point underflow condition flag 


5t 


LV 


Latched overflow condition flag 


6t 


LUF 


Latched floating-point underflow condition flag 


7 


OVM 


Overflow mode flag. This flag affects only integer operations. 

If OVM = 0, the overflow mode is turned off; integer results that overflow are 
treated in no special way. 
lfOVM = 1, 

a) integer results overflowing in the positive direction are set to the 
most positive 32-bit twos-complement number (7FFF FFFFh). 

b) integer results overflowing in the negative direction are set to the 
most negative 32-bit twos-complement number (8000 OOOOh). 

Note that the functions of bits V and LV are independent of the setting of OVM. 


8 


RM 


Repeat mode flag. If RM = 1 , the PC is being modified in either the repeat- 
block or repeat-single mode. 



t The seven condition flags (ST bits 0-6) are defined in Section 1 1 .2 on page 11-10. 
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Table 3-2. Status Register Bits Summary (Continued) 



Bit 


Bit Field 
Name 


Function 


9 


PCF 


Previous state of bit CF. When a trap executes or an interrupt is taken, bit CF is 

^pt to 1 Whpn thta nrriirQ thp POF hit ^pt to thp OF bit'c x/alup hpforp thp trao 

or interrupt. Note that the RETI and RETID instructions copy PCF to the CF bit. 


10 


CF 


Cache freeze. Set CF = 1 to freeze cache (cache is not updated) including LRU 
(least recently used) stack manipulation. If the cache is enabled (CE = 1 ), fetches 
from the cache are allowed, but modification of the cache contents is not allowed. 
Cache clearing (CC=1 ) is allowed. At reset, this bit is set to zero. When CF=0, 
cache clearing (CC=1 ) is allowed. CF is set to one when a trap or interrupt is tak- 
en. Also, the RETI and RETID instructions copy PCF to the CF bit. 


11 


CE 


Cache enable. Set CE = 1 to enable the cache, allowing the cache to be used 
according to the LRU (least recently used) cache algorithm. Set CE = 0 to disable 
the cache; preventing cache updates or modifications (thus, no cache fetches 
can be made). At reset, 0 is written to this bit. Cache clearing (CC = 1 ) is allowed 
when CE=0. The following describe the combination of the CE and CF bits: 
CE Qf Effect 
0 0 Cache not enabled 

0 1 Cache not enabled 

1 0 Cache enabled and not frozen 

1 1 Cache enabled but frozen (cache read only) 


12 


CC 


Cache clear. CC = 1 invalidates all entries in the cache (contents not guaranteed, 
"garbage"). This bit is always cleared after it is written to and thus always read 
as 0. At reset, 0 is written to this bit. All cache P flags = 0 when cache is cleared. 


13 


GIE 


Global interrupt enable. If GIE = 1 , the CPU responds to an enabled interrupt. If 
GIE = 0, the CPU does not respond to an enabled interrupt (when a trap executes 
or an interrupt is taken, bit GIE is set to 0). This bit does not affect interrupts on 
the NMI pin. The IDLE, LAT, RETI, RETID, and TRAP instructions affect this bit's 
value. 


14 


PGIE 


Previous state of bit GIE. When a trap executes or an interrupt is taken, bit 
GIE is set to 0. When this occurs, the PGIE bit is set to the GIE bit's value 
before the trap or interrupt. Note that the RETIcond and RETIcondD instruc- 
tions copy PGIE to the GIE bit. At reset, this bit is set to 0. 


15 


SET COND 


This bit determines how condition flags (ST bits 0-6) are set: 
If SET COND = 0, condition-flags are set if the operation's 

tarnot ie an\/ o YtonHoH-r*\ror»ici/^n ronictor /RH — R11\ r»om- 
icu y ci io cu ly caici iucu pi cwioiui 1 1 cyioit;i \rvj — n i i ) vaji i i 

patibie with the TMS320C30. This bit is set to 0 at reset. 
If SET COND = 1 , condition flags are set if the target of the 

operation is any register in the primary register files except 
the status register. 

Condition flags are always set when a CMPF, CMPL, CMPF3, CMPI3, TSTB, 
or TSTB3 instruction is executed. 


16 


ANALYSIS 


In analysis mode — state information for emulation. Read only. 


17-31 


Reserved 


Value undefined. Read only. Reserved for an identification value. This value is 
set by Texas Instruments (e.g., to identify device types and revisions). 
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3.1 .8 DMA Coprocessor Interrupt Enable Register (DIE) 

The 32-bit DMA interrupt enable register (DIE), shown in Figure 3-4, is 
broken into six subf ields that determine which interrupts can be used to con- 
trol the synchronization for each of the six DMA coprocessor channels. At 
reset, all zeroes are written to the register. 



Figure 3-4. DMA Interrupt Enable Register Bit Functions 



31 29 


28 26 


25 23 


22 20 


| DMA5 WRITE 


DMA5 READ 


DMA4 WRITE 


DMA4 READ | 


R/W R/W R/W 
19 17 


R/W R/W R/W 
16 14 


R/W R/W R/W 
13 11 


R/W R/W R/W 
10 8 


| DM A3 WRITE 


DMA3 READ 


DMA2 WRITE 


DMA2 READ | 



R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W 
7 6 5 4 3 2 1 0 

| DMA1 WRITE | DMA1 READ | DMAO WRITE | DMAO READ "] 



R/W R/W R/W 
R = Read W = Write 



R/W 



R/W R/W R/W R/W 



Table 3-3 summarizes the interrupt activity for each of the four possible 
combinations of two-bit values in DMAO and DMA1 (bottom of Figure 3-4). 
Likewise, Table 3-4 (page 3-9) summarizes the interrupts enabled by 
three-bit values in DMA2 through DMAS. 



i i 
Note: DMA Coprocessor Uses Signals to Synchronize 

The interrupts in Table 3-3 and Table 3-4 (ICRDYx, OCRDYx, TIMO, 
etc.) are not vectored. The DMA uses these as signals to synchronize 
DMA coprocessor transfers. This is explained in Section 9.9 on page 
9-40. 

i — i 
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Table 3-3. DMA Channels 0 and 1 Synchronization Interrupts (DMAO and DMA 1) 



Bit Value 
(in DMAO 
or DMA1) 


Interrupt Enabled at DMAO or DMA1 


Interrupt Source for 
DMA Synchronization 


DMAO 
Read 


DMAO 
Write 


DMA1 
Read 


DMA1 
Write 


00 


None 


None 


None 


None 




01 


ICRDYO 


OCRDYO 


ICRDY1 


OCRDY1 


From communication port 


1 0 


IIOFO 


IIOF1 


IIOF2 


IIOF3 


From external pins IIOF0-IIOF3 


1 1 


TIMO 


TIMO 


TIMO 


TIMO 


From timer TIMO 



This interrupt synchronization scheme allows each DMA channel to service 
a corresponding input communication port and output communication port. 
Also, each DMA channel can be synchronized with external interrupts and 
the on-chip timers. 



Table 3-4. DMA Channels 2 to 5 Synchronization Interrupts (DMA2 to DMA)5 



Bit Value 
(in DMA2 to DMA5 


Interrupt Enabled at DMA2-DMA5t 


Interrupt Source for 
DMA Synchronization 


tDMAxRead 


tDM Ax Write 


0 0 0 


None 


None 




001 


tICRDYx 


tOCRDYx 


From communication port 


0 1 0 


IIOFO 


IIOFO 


\ From external pins 
( INTO - INT3 


0 1 1 


IIOF1 


IIOF1 


1 00 


IIOF2 


IIOF2 


1 0 1 


IIOF3 


HOF3 


1 1 0 


TIMO 


TIMO 


^> From timers TIMO and TIM1 


1 1 1 


TIM1 


TIM1 



t The x in DMA* is the DMA channel number, which is also the number for the corresponding ICRDYx and 
OCRDYxinterrupts. For example, an 001 p in both DMA2 READ and DMA5 WRITE would enable interrupts 
ICRDY2 and OCRDY5, respectively. All other viable bit values (01 Op to 1 1 1 p) are the same (as shown in the 
table) for DMA2 through DMA5. 

Note that each DMA channel looks not only at the DMA synchronous inter- 
rupts selected but also at the synchronization mode that the channel is cur- 
rently using (see Table 9-4 on page 9-15). The synchronization mode is 
specified by the DMA channel control registers located in the DMA 
coprocessor. 
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3.1 .9 CPU Internal Interrupt Enable Register (HE) 

The 32-bit internal interrupt enable register, shown in Figure 3-5, enables/ 
disables the following interrupts for the CPU: 

□ Timers 0 and 1 , 

□ For communication ports 0-5: 

■ Input-buffer full, 

■ Input-buffer ready, 

■ Output-buffer ready, 

■ Output-buffer empty . 

□ DMA coprocessor channels 0-5. 

Figure 3-5 shows the HE register bits, andTable 3-5 describes the interrupt 
enabled, depending on the bit value. A 1 read means the corresponding in- 
terrupt is enabled; a 0 indicates disabled. At reset, all zeroes are written to 
the register. 
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CPU Register File — CPU Interrupt Enable Register (I IE) 
Figure 3-5. Internal Interrupt Enable Register (HE) 
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24 


23 


22 
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20 


19 



EOC- 
EMPTY5 


EOC- 
RDY5 


EIC- 
RDY5 


EIC- 
FULL5 


EOC- 
EMPTY4 


EOC- 
RDY4 


EIC- 
RDY4 


EIC- 
FULL4 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


16 


15 


14 


13 


12 


11 


10 


9 


EOC- 
EMPTY3 


EOC- 
RDY3 


EIC- 
RDY3 


EIC- 
FULL3 


EOC- 
EMPTY2 


EOC- 
RDY2 


EIC- 
RDY2 


EIC- 
FULL2 



R/W R/W R/W R/W R/W R/W R/W R/W 

8 7 6 5 4 3 2 1 0 



EOC- 


EOC- 


EIC- 


EIC- 


EOC- 


EOC- 


EIC- 


EIC- 


ETINTO 


EMPTY1 


RDY1 


RDY1 


FULL1 


EMPTYO 


RDYO 


RDYO 


FULLO 




R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 


R/W 



R = Read, W = Write, R/W = Read/Write 



fa6/e 3-5. Summary of Interrupt Enable Register Bits (HE) 



HE Bit Field Name 


HE Bit Numbers ^<^°>^ 
0 1 2 3 4 5 ?^/^ Enables/Disables (note 1) 


EICFULLx (Note 2) 


1 5 9 13 17 21 


Comm. port x input-buffer full interrupt 


EICRDYx (Note 2) 


2 6 10 14 18 22 


Comm. port x input-buffer ready interrupt 


EOCRDYx (Note 2) 


3 7 14 15 19 23 


Comm. port x output-buffer ready interrupt 


EOCEMPTYx (Note 2) 


4 8 12 16 20 24 


Comm. port x output-buffer empty interrupt 


EDMAINTx (Note 2) 


25 26 27 ' 28 ,29, 30 


DMA coprocessor channel x interrupt 


ETINTO 


0 


Timer 0 interrupt 


ETINT1 


31 


Timer 1 interrupt 



NOTES: 1 The x represents a corresponding communication port number (0 - 5) or DMA coprocessor 
chanel number (0 - 5). For example, ones in bits 5 and 25 enable interrupts for (a) input-buffer 
full at communication port 1 and for (b) DMA coprocessor channel 0. (A 1 enables the interrupt; 
a 0 disables it.) 

2. Communication port bits are shaded according to communication port number. For example, 
communication port 0*s bit numbers are in the first group of vertical shading. Thus, communic- 
ation port 0's bits are 1,2, 3, 4; communication-port 1's bits are 5, 6, 7, 8; etc. The DMA 
coprocessor channel interrupts are shown the same way (e.g., EDMAINT0 at bit 25, 
EDMAINT1 at bit 26, etc.). 
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3.1 .1 0 IIOF Flag Register (IIF) Controls External Pins IIOF(3 - 0), 
Timer/DMA Flags 

The IIF register controls the external interrupt pins IIOF(3 - 0). Use it to spec- 
ify: 

□ which IIOF pins are used for general-purpose I/O and which are used 
for interrupts, 

□ whether a general-purpose pin is input (read only) or output (read/ 
write), 

□ whether an interrupt pin is for edge-triggered or level-triggered inter- 
rupts, 

□ if an interrupt is enabled or disabled. 

Figure 3-6 depicts the IIF register bits. Table 3-6 (page 3-13) explains 
these bits in detail. Interrupt traps are shownin Figure 3-7 (page 3-15). In- 
terrupts are further explained in Section 6.7 on page 6-23. 



Figure 3-6. Interrupt Flag Register (IIF) 
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4 


3 


2 
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R/W 
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R = Read (only), R/W = Read/Write, xx = Reserved, read as 0 
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Table 3-6. IIF Register Bits Summary 



Bit Field 
Name 


IIF Bit Nos. SZ&yS _ _ 

q 1 2 — J?wy^ Function (Note 1 ) 


FUNCx 
(note 2) 


ii* 4 12 


Mode of pin IIOFx: 

If FUNCx = 0, pin IIOFx is a general-purpose I/O (R/W) pin. 
If FUNCx = 1 , pin IIOFx is an interrupt (R) pin. 


TYPEx 
(note 2) 


\i 5 ft* 13 


Type of function for pin IIOFx: 

If pin IIOFx is a general-purpose I/O pin (FUNCx = 0): 

TYPEx = 0 makes IIOFx an input pin. 

TYPEx = 1 makes IIOFx an output pin 
If pin IIOFx is an interrupt pin (FUNCx = 1 ): 

TYPEx = 0 makes IIOFx an edge-triggered latched interrupt, 

TYPEx = 1 makes IIOFx a level-triggered unlatched interrupt. 


FLAG* 
(note 2) 


:*|^ ^ 

: % s % -- S % 

•• , V 

2 6 10 14 


Flag for pin IIOFx: 

If pin IIOFx is a general-purpose input pin (FUNCx = 0, TYPEx = 0), 

FLAGx = the value of pin IIOFx and is read only. 
If pin IIOFx is a general-purpose output pin (FUNCx = 0, TYPEx = 1 ), 

FLAGx = the value on pin IIOFx and is R/W. 
If pin IIOFx is an interrupt pin (FUNCx = 1): 
FLAGx = 0 if interrupt is not asserted. 
FLAGx = 1 if interrupt is asserted. 
If 0 (zero) is written to FLAGx, the corresponding interrupt is 
cleared unless an interrupt is on the same pin; in that case, 
the interrupt will be set. 


EIIOFx 
(note 2) 


3 7 11 15 


Disable/enable external interrupt: 

EIIOFx = 0 disables external interrupts at pin IIFOx. 
EIIOFx = 1 enables external interrupts at pin IIFOx. 


NMI 


16 


Nonmaskable Interrupt flag (NMI). The NMI interrupt (on the external NMI pin) 
behaves like other interrupts, except it cannot be masked (disabled) bytheGIE 
bit (ST bit 13) or by writing to the NMI bit itself. It is temporarily masked during 
delayed branches and multicycle CPU operations. At reset, this bit is cleared. 
An asserted interrupt is cleared only by servicing the interrupt. NMI is a negati- 
ve-going, edge-triggered, latched interrupt. It is read only. 

Reading NMI as 0 indicates the interrupt is not asserted. 

Reading NMI as 1 indicates the interrupt is asserted. 


Reserved 


17- 23 


Reserved; read as zeroes. 


TINTO 
TINT1 


24 
31 


Timer interrupt flags 0 and 1 : 

Reading TINTxas 0 indicates the timer interrupt is not asserted. 
Reading TINTx as 1 indicates the timer interrupt is asserted. 
A zero written to this bit clears the interrupt unless the interrupt is 

asserted at the same time; in that case, the interrupt will be shown 

as asserted. 


DMAINTx 


25-30 


Interrupt flag for DMA coprocessor channels 0 to 5. 

Reading DMAINTx as 0 indicates the channel interrupt is not asserted. 
Reading DMAINTx as 1 indicates the channel interrupt is asserted. 
A zero written to this bit clears the interrupt unless the interrupt is 

asserted at the same time; in that case, the interrupt will be 

shown as asserted. 



NOTES: 1 The xrepresents the corresponding IIOF interrupt pin (IIOF3- IIOF0). R = Read, /W = Read/Write 
2. Shading organizes each communication port's bits the same as shown for the HE register 
inTable 3-5 (see note 2) on page 3-11 . For example, bits 0, 1 , 2, 3 apply to pin IIOF0; bits 4, 5, 
6, 7 apply to IIOF1,etc. 
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3.1.11 Block-Repeat (RS, RE) and Repeat-Count (RC) Registers 

The 32-bit repeat start address register (RS) contains the starting address 
of the block of program memory to be repeated when operating in the repeat 
mode. 

The 32-bit repeat end address register (RE) contains the ending address 
of the block of program memory to be repeated when operating in the repeat 
mode. 

The repeat-count register (RC) is a 32-bit register used to specify the num- 
ber of times a block of code is to be repeated when performing a block re- 
peat. If RC contains the number n, the loop will be executed n + 1 times. 

3.1.12 Program Counter (PC) 

The program counter (PC) is a 32-bit register containing the address of the 
next instruction to be fetched. While the program counter is not part of the 
CPU register file, it is a register that can be modified by instructions that 
modify the program flow. 

3.1.13 Reserved Bits and Compatibility 

In order to retain compatibility with future members of the TMS320C4x fami- 
ly of microprocessors, reserved bits that are read as zero must be written 
as zero. Reserved bits that have an undefined value must not have their 
current value modified. In other cases, maintain the reserved bits as speci- 
fied. 
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3.2 CPU Expansion Register File 

This expansion register file contains two special control registers: 

□ Interrupt-vector table pointer register (IVTP), 

□ Trap-vector table pointer (TVTP). 

Table 3-7. CPU Expansion Registers 



Assembler Syntax 


Function Name 


IVTP 
TVTP 


Interrupt-vector table pointer. Points to start of interrupt- 
vector table (shown in Figure 3-8). 

Trap-vector table pointer. Points to start of the 51 2-trap- 
vector table (shown at page bottom). 



Use the LDEP instruction to load (copy) an expansion register to a primary 
register (e.g., to any of the auxiliary registers ARO - AR7, see Table 3-1 on 
page 3-3). For example: 



LDEP IVTP,AR5 ; IVTP contents to AR5 

Likewise, use the LDPE instruction to load (copy) a primary register to an 
expansion register. Neither of these instructions affects the status register 
condition flags. 

LDPE AR5,IVTP ; AR5. contents to IVTP 

Note that both the interrupt-vector table and the trap-vector table are re- 
quired to lie on a 512-word boundary; thus, the nine least-significant 
bits of these pointers are zeroes (i.e., 10 0000 OOOO2 = 512 = 200h). 
Write only zeroes to these bits (though the register forces these to zeroes). 

The 32-bit IVTP register points to (is essentially the base address for) the 
interrupt-vector table (IVT) in memory. The contents of this table are de- 
picted in Figure 3-8 on page 3-16. 

The 32-bit TVTP register is essentially the base address for the trap-vector 
table (TVT) in memory. This table, depicted below, contains the vectors for 
the TRAP instruction's 512-trap addresses (TRAP0-TRAP51 1 ), 

The interrupt (including RESET — see Section 3.3) and trap maps can be 
configured to overlap. At reset, IVTP and TVTP are set to all zeroes. 

Figure 3-7. Trap Vector Table (TVT) 



TVTP + OOOh TRAP0 

TVTP + 001 h TRAP1 

: ro 

• TRAP509 

TVTP + 1FEh TRAP510 

TVTP + 1FFh TRAP511 
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Figure 3-5. 


Interrupt-Vector Table (IVT) 














IVTP 
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Note 1 


IVTP 
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01 Dh 


ICFULL4 


IVTP 
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001 h 


NMI 


Note 2 


IVTP 
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01 Eh 


ICRDY4 


IVTP 
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002h 


TINTO 


Note 3 


IVTP 
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01 Fh 


OCRDY4 


IVTP 


+ 


003h 


UOFO 






IVTP 


+ 


020h 


OCEMPTY4 


IVTP 


+ 


004h 


UOF1 




► Note 4 


IVTP 


+ 


021 h 


ICFULL5 


IVTP 


+ 


005h 


TiOF2 




IVTP 


+ 


022h 


ICRDY5 


IVTP 


+ 


006h 


MOF3 






IVTP 


+ 


023h 


OCRDY5 


IVTP 


+ 


007h 








IVTP 


+ 


024h 


OCEMPTY5 


IVTP 


+ 


• 
• 








IVTP 


+ 


025h 


DMA INTO 


IVTP 


+ 


OOCh 






IVTP 


+ 


026h 


DMA INT1 


IVTP 


+ 


OODh 


ICFULLO 


\ 




IVTP 


+ 


027h 


DMA INT2 


IVTP 


+ 


OOEh 


ICRDYO 






IVTP 


+ 


028h 


DMA INT3 


IVTP 


+ 


OOFh 


OCRDYO 






IVTP 


+ 


029h 


DMA INT4 


IVTP 
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01 Oh 


OCEMPTYO 






IVTP 


+ 


02Ah 


DMA INT5 


IVTP 
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IVTP 


+ 
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IVTP 


+ 


02Ch 




IVTP 


+ 


01 3h 


OCRDY1 






IVTP 


+ 






IVTP 


+ 


01 4h 


OCEMPTY1 






IVTP 


+ 






IVTP 


+ 
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+ 


• 


Unused 
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+ 
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+ 






IVTP 


+ 
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+ 


• 




IVTP 


+ 
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+ 


• 
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+ 
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+ 
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IVTP 


+ 
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+ 
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+ 
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+ 
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y 




IVTP 


+ 


03Fh 


Reserved 



} Note 5 



► Note 6 



Note 3 



Notes: 1 ) Reserved for the reset vector when IVTP = 0000 OOOOh and RESETLOC(1 ,0) = 0 0 2 or 
when IVTP=08000 OOOOh and RESETLOC(1 ,0) = 1 0 2 . See Table 3-8. 

2) NMI (nonmaskable interrupt) is discussed in Section 9.9, page 9-40. 

3) Timer interrupts TINTO and TINT1 are enabled and programmed by the HE register (subection 
3.1.9, page 3- 10) a n d mon itored at the IIF register (subection 3.1.10, page 3-12). 

4) External pins IIOF0-IIOF5 are programmed in the DIE register (subsection 3.1 .8, page 3-8) 
and IIF register. 

5) The communication port I/O buffers full/ready interrupts are enabled by the DIE and HE re- 
gisters and also discussed in Table 8-1, page 8-10 (OUTPUT LEVEL & INPUT LEVEL bits). 

6) DMA interrupts are enabled at the HE register and DMA channel control register (at bits TCC 
and AUX TCC explained in Table 9-1 on page 9-8). 
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3.3 RESET Vector Mapping 

The 'C40s RESET vector can reside in any one of four memory locations. 
The value on two external pins (RESETLOC(1 ,0)) determines the RESET 
vector location as shown in the following table. 

Table 3-5. Four RESET Vector Locations Chosen by Values on Pins RESETLOC(1, 0) 



Value at RESETLOCx Pin 


Get RESET Vector 
From Memory Address 


Comment 


RESETLOC1 RESETLOC0 


0 0 


00000 0000 16 


Local Bus 


0 1 


07FFF FFFF 16 


Local Bus 


1 0 


08000 0000! 6 


Global Bus 


1 1 


0FFFF FFFF 16 


Global Bus 



Note that if pin ROMEN = 1 and the vector at 0000 OOOOh is enabled (pins 
RESETLOC(1 ,0) = 00), then the vector is mapped to address 0 of internal 
ROM. 

This mapping scheme of the RESET vector allows the TMS320C40 to be 
integrated easily into systems having other processors with fixed RESET 
vector locations. It also allows you to make the RESET vector either external 
or internal (on-chip ROM) to the processor. 
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The TMS320C40 , s memory space of 4 giga words (4 billion x 32 bits where 
1 G = 2 30 ) is shown in the two memory maps in Figure 3-9. These maps 
differ only by the makeup of the lowest address space at 0000 OOOOh to 
0000 OFFFh. This makeup is configured by the value at pin ROMEN 
(onchip — reserved — ROM enable, pin AK4): 

□ ROMEN s 1. Addresses OOOOh - OFFFh are an accessible onchip 
ROM block (reserved), and 0000 1000h - 000F FFFFh are reserved 

□ ROMEN = 0. The on-chip (reserved) ROM is disabled, and address- 
es 0000 OOOOh - 000F FFFFh are accessible over the local bus. 



Instructions 
cannot be 
accessed in 
these 3 areas. 



Memory in both maps starting at 10 OOOOh is not affected by ROMEN (as 
described for addresses OOOOOh - FFFFFh above). A general summary of 
address ranges: 

□ 0000 OOOOh - 000F FFFFh: Can be local bus or on-chip (reserved) 
ROM, depending on the value of pin ROMEN. 

□ 0010 OOOOh— 0010 OOFFh: Internal peripherals \> 
(DMA coprocessor, communications ports, timers, 
etc.) 

□ 0010 0100h-001F FFFFh: Internal peripheral re- 
gion. 

□ 0020 OOOOh -002FF7FFh: Reserved. / 

□ 002F F800h - 002F FBFFh: 1 K RAM Block 0. 

□ 002F FCOOh - 002F FFFFh: 1 K RAM Block 1 . 

□ 0030 OOOOh - 07FFF FFFFh: Local bus. If ROMEN = 1 , another part 
of the local bus is at 00 OOOOh - OF FFFFh. These addresses activate 
the local bus. 

□ 08000 OOOOh - OFFFF FFFFh: Global bus. 

CPU data accesses and DMA accesses can be made from any unreserved 
part of the 'C40 memory map. I nstruction fetches can take place at any unre- 
served area of the '040 memory map except at the peripheral space (ad- 
dresses 0010 OOOOh -0010 OOFFh). 

The 'C40's internal ROM is currently reserved for Tl internal use only. 
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Memory 



3.4.1 Overall Memory Map 

Figure 3-9. Memory Maps 




Accessible 
Local Bus 
(External) 



Peripherals (Internal) 
. Fi£ureJ3-10] _ 



Reserved 



Reserved 



1K RAM BLK 0 (Internal) 
"l K RAM BLK f( Internal) ' 



Local Bus 
(External) 



2G 



Global Bus 
(External) 



00000 OOOOh 

00000 OFFFh 
00000 1000h 



0000F FFFFh 
00010 OOOOh 

00010 OOFFh 
00010 01 OOh 



0001 F FFFFh 
00020 OOOOh 

0002F F7FFh 
0002F F800h 
0002F FBFFh 
0002F FCOOh 
0002F FFFFh 
00030 OOOOh 



4K ROM 
(Reserved) 



Reserved 



Peripherals (Internal) 
(See Figure 3-10) 



Reserved 



Reserved 
1 K RAM BLK 0 (Iritemai) ' 



1K RAM BLK 1 (Internal) 



Local Bus 
(External) 



07FFF FFFFh 
08000 OOOOh 



OFFFF FFFFh 



Global Bus 
(External) 



(a) Internal ROM Disabled 
(ROMEN = 0) 



(b) Internal ROM Enabled 
(ROMEN = 1) 
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Memory - Peripheral Bus Memory Map 

3.4.2 Peripheral Bus Memory Map 

This map resides in addresses 001 0 OOOOh - 001 0 OOFFh as shown in the 
memory map, Figure 3-9. Each peripheral requires a 16-word area. 
Figure 3-10. Peripheral Memory Map 



0010 OOOOh 
0010 000Fh 

001 0 001 Oh 
0010 001 Fh 

0010 0020h 
0010 002Fh 

0010 0030h 
0010 003Fh 

0010 0040h 
0010 004Fh 

0010 0050h 
0010 005Fh 

0010 0060h 
0010 006Fh 

0010 0070h 
0010 007Fh 

0010 0080h 
0010 008Fh 

0010 0090h 
0010 009Fh 

0010 00A0h 
0010 00AFh 

0010 00B0h 
0010 00BFh 

0010 00C0h 
0010 00CFh 

0010 00D0h 
0010 00DFh 

0010 00E0h 
OOlOOOEFh 

OOlOOOFOh 
0010 OOFFh 




l|UIII !!!!. !!![ I|! !|!| OIL M !! 
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,'Ti m'fer Q " Registers" '(1 6 worcls) . ,*> ...W ' 
\(Se$ sbfcis^ction 3:4:2.3, and Figure 3-1 3)' 



.^^Tliifie^ ?1^3&gl^^^te ,,,h (>6 , v>6ihril , s) ,,, '■. 

.^(jS^j&^uWe^n'a^^ and Figure 3*r1 3) 
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Communication Port 0 (16 worcls) 

(See subsection 3.4,2,4 and Figure 3-1 4) 



Communication Port 1 (16 words) 

(See subsection 3.4,2,4 and Figure 3-14} 



Communication Port 2 (16 words) 

(See subsection 3.4.2,4 and Figure 3-14) 



Communication Port 3 (16 words) 

(See subsection 3.4.2.4 and Figure 3-14) 



Communication Port 4 (16 words) 

(See subsection 3.4.2,4 and Figure 3-14) 



Communication Port 5 (16 words) 

(See subsection 3.4.2.4 and Figure 3-14) 



DMA Coprocessor Channel 0 (16 words) 
• , ; ' ,'' (See subsection &4,2,5 and Figure 3-15, page'3-24,)"' „" 



"'(Swords)' 



(See subsection 3:4,2,5 and Figure 3-15, page 3-24,) 



DMA Coprocessor, Channel 2 (16 words) 1 « j \ 
; ' " (See subsection 3.4,2,5 and Figure 3r-l5, page 3-24.) 



! bfClA Coprocessor Channel 3 (16 words) 

''i'''^""'^'^!'''!' ■ (See' subsection 3?4,2,5 and ' Figu re'^t^'pa^e 1 ' 3'-24|')' 



DMA Coprocessor Channel 4(16 words) , , 

'" ," ''(See subsection 3.4,2.5 and Figure 3-15, page 3-24,) ;,; 



"DMA'CoprocessorCharinel 5 (16 words) '''"<' v-, ■.. 1 \ v! 

■ (See subsection 3.4.2,5 Figure 3-1 5, page 3-24,) 
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3.4.2. 1 Local and Global Memory Interface Control Registers 

These registers control the local and global memory interfaces. They 
occupy the first 16-word block of the peripheral bus memory map, shown 
in Figure 3-1 0. The registers themselves are shown in Figure 3-1 1 . Chap- 
ter 7 covers the operation of these registers. A detailed description of these 
is shown in Figure 7-2 and Table 7-3 (pages 7-7and 7-8). 

These registers define: 

■ the page sizes used for the two strobes of each port, 

■ address ranges over which the strobes are active, 

■ wait states, and 

■ other similar operations that compose the memory interfaces. 
Figure 3- / /. Memory Interface Control Registers 



0010 OOOOh 
0010 0001 h 

0010 0003h 
0010 0004h 
0010 0005h 



Reserved 



ii* ii' if I * 1 i i 1 ii' i i ' i i' 1 ii* i |! ii' ■ ■' ' ,! - »■' •■' ■'' '" y ■'' ■■' 



Reserved 



0010 OOOFh 



3.4.2.2 Analysis Module Registers 

These registers, the second 16-word block in the peripheral bus memory 
Map (Figure 3-10), are shown below in Figure 3-12. These registers are 
reserved for emulation functions. 

Figure 3-12. Analysis Module Registers 



0010 001 0h 
0010 001 1h 
0010 001 2h 
0010 001 3h 
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3.4.2.3 Timer Registers 

This group of registers occupies the 001 0 0020h - 001 0 003Fh range in the 
peripheral bus memory map, Figure 3-1 0, on page 3-20. Timers and their 
registers are covered in detail in Section 9.10 on page 9-45. 

Figure 3- 13. Timer Registers 

r 



Timer 



° < 



K. 



0010 0020h 



0010 0024h 



0010 0028h 



r 



Timer 1 



0010 0030h 



0010 0034h 



0010 0038h 



; , Ji 



Reserved 



V Timer 0 Counter Register ■ !\\\!* 



Reserved 



Timer 0 Period Register 



Reserved 



sTjmef 1 CQjitrQl Register;; 



Reserved 



Timer 1 Counter Register 



Reserved 



% Timer 1 Period Register 



Reserved 



0010 003Fh 
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3.4.2.4 Communication Port Memory Map 

The communication-port control registers (CPCR) and input and output 
FIFO buffers are illustrated below in Figure 3-14. This is the central group 
of registers in the peripheral bus memory map, Figure 3-1 0, on page 3-20. 
These are described in more detail in Chapter 8. 

Figure 3-14. Communication Port Memory Map 



0010 0040h 
0010 0041 h 
0010 0042h 

0010 0050h 
0010 0051 h 
0010 0052h 

0010 0060h 
0010 0061 h 
0010 0062h 



0010 0070h 
0010 0071 h 
0010 0072h 



0010 0080h 
0010 0081 h 
0010 0082h 



0010 0090h 
0010 0091 h 
0010 0092h 



-9 



0010 009Fh 



Output Port O 



Reserved 



CPCR1 



i Input Port 1 j 



i Output Port 1 



Reserved 



CPCR 2 



i Output Part 2 i 



Reserved 



CPCR 3 i 



i Input Port3 



0$pUtPOft3 



Reserved 



;CPCR4 



lfpjlf*ort4 



i Output Port 4 i 



Reserved 



OPOR5 



j Input PortS 



Output Port 5 



Reserved 



5 
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3A.2.5 



Figure 3- 
0010 



0010 

0010 



0010 
0010 



0010 
0010 



0010 
0010 



DMA Coprocessor Registers 

The DMA registers (shown below) are the bottom block of registers in the 
peripheral bus memory map (Figure 3-10 on page 3-20). These registers 
are described in Chapter 9. Figure 9-2, page 9-5, is an index to subjects. 

•15. DMA Coprocessor Memory Map 

OOAOh 



00A8h 
00A9h 



, Channel 
Registers 
(See exploded 
view) 



T 



DMA Ch 0 



OOAFh 
OOBOh 



00B8h 
00B9h 



Reserved 



, Channel •, 

Registers 
(See exploded 
view) 



DMA Ch 1 



S Reserved S 



OOBFh 
OOCOh 



0010 
0010 



0010 
0010 



0010 
0010 



0010 
0010 



0010 
0010 



0010 
0010 



0010 
0010 



00C8h 
00C9h 



Channel 
Registers 
(See exploded 
view) 



DMA Ch 2 



OOCFh 
OODOh 



Reserved 



00D8h 
00D9h s 

OODFh 
OOEOh 



Channel 
Registers 
(See exploded 
view) 



DMA Ch 3 



00E8h 
00E9h 



Reserved 



Channel 
Registers 
(See exploded 
view) 



DMA Ch 4 



OOEFh 
OOFOh 



% Reserved ^ 



00F8h 
00F9h 



Channel 
Registers 
(See exploded 
view) 



EXPLODED VIEW OF EACH CHANNEL 
REGISTER 



010 OOzOh 


Control Register x 


010 00z1h 


Source Address x 


010 00z2h 


Source Address Index x 


010 00z3h 


Transfer Counter x 


010 00z4h 


Destination Address x 


010 00z5h 


Destination Address Index x 


010 00z6h 


Link Pointer x 


010 00z7h 


Auxiliary Transfer Counter x 


010 00z8h 


Auxiliary Link Pointer x 



T 

DMA 
Ch. 
X 

1 



T* 

DMA Ch 5 



x = channel number (e.g., all are 1 for 
channel 1 , all 2 for channel 2, etc.). 

z = corresponding hexadecimal digit for 
channel address (e.g., substitute 
an "A" for DMA channel 0, "B" for 
DMA channel 1 , etc.). 



These registers are described in 
Chapter 9, and an index of de- 
scription locations is listed in 
Figure 9-2 on page 9-5. 



0010 OOFFh 



t Reserved <^ 
I 
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3.5 Instruction Cache Architecture 

The 128 x 32-bit instruction cache speeds instruction fetches and lowers 
system cost. The instruction cache allows the use of slow external memo- 
ries while still achieving single-cycle access performance. The cache also 
frees the external bus from program fetches, thus, allowing the use of these 
buses for DMA or other system needs. The cache can operate in a com- 
pletely automatic fashion without the need for external intervention. It uses 
a form of the LRU (least recently used) cache update algorithm. 

The instruction cache (see Figure 3-17 on page 3-26 ) contains 128 32-bit 
words of RAM, enough to hold 128 words of program memory. It is divided 
into four 32-word segments. Associated with each segment is a 27-bit seg- 
ment start address (SSA) register. For each word in the cache, there is a 
corresponding single-bit present (P) flag. 

When the CPU requests an instruction word, a check is made to determine 
whether the word is already in the instruction cache. The partitioning of an 
instruction address as used by the cache control algorithm is shown in 
Figure 3-1 6. The 27 most significant bits (MSBs) of the instruction address 
select the segment, and the 5 least significant bits define the address of the 
instruction word within the pertinent segment. The 27 MSBs of the instruc- 
tion address are compared with the four SSA registers. If a match is found, 
the relevant P flag is checked. The P flag indicates whether or not the word 
within a particular segment is already present in cache memory: 

□ P = 1 : the word is already present in cache memory. 

□ P = 0: location in cache is invalid (e.g., contains garbage). 



Figure 3-16. Address Partitioning for Cache Control Algorithm 

31 



54 



segment start address 
(SSA) 



instruction word 
address within segment 



If there is no match, one of the segments must be replaced by the new data. 
The segment replaced in this circumstance is determined by the LRU (least 
recently used) algorithm. The LRU stack (see upper right of Figure 3-1 7 on 
page 3-26) is maintained for this purpose. 
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Figure 3-17. Instruction Cache Architecture 



Segment Start 
Address Registers 



r 



27 bits ■ 



Flags 



Segment Words 



LRU 
Stack 



H 



Segment Word 0 



Segment Word 1 



-1 bit 



30 

31 [ 



} Segment 0 



Segment Word 30 



Segment Word 31 




Most Recently 
Used Segment 
Number 



Least Recently 
Used Segment 
Number 



32 bits 
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0 
1 
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31 



Segment Word 0 



Segment Word 1 
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Segment Word 30 



Segment Word 31 



■ ? 



0 
1 

30 
31 



Segment Word 0 



Segment Word 1 



) Segment 2 



Segment Word 30 



Segment Word 31 



3 



30 
31 



Segment Word 0 



Segment Word 1 



Segment Word 30 



Segment Word 31 



> Segment 3 
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The LRU stack keeps track of which segment (0 - 3) qualifies as the least 
recently used after each access to the cache. Each time a segment is ac- 
cessed, its segment number is removed from the LRU stack and pushed 
onto the top of the LRU stack. Therefore, the number at the top of the stack 
is the most recently used segment number, and the number at the bottom 
of the stack is the least recently used segment number. 

At RESET, the following occur in the instruction cache: 

□ all P flags are set to zero, and 

□ the LRU stack is initialized with segment no. 0 at the top followed by 
1 , 2, and 3 at the bottom. If any two SSA registers are equal (due to RE- 
SET conditions) and a cache hit occurs, the instruction word is fetched 
from the most recently used segment. 

When a replacement is necessary, the least recently used segment is se- 
lected for replacement. Also, the 32 P flags for the segment to be replaced 
are set to 0, and the segment's SSA register is replaced with the 27 MSBs 
of the instruction address. 

3.5.1 Cache Algorithm 

When the TMS320C40 requests an instruction word from external memory, 
the two possible actions are a cache hit or a cache miss. 

□ Cache Hit. The cache contains the requested instruction, and the fol- 
lowing actions occur: 

■ The instruction word is read from the cache. 

■ The number of the segment containing the word is removed from 
the LRU stack and pushed to the top of the LRU stack (if not already 
at the top) , thus moving the other segment numbers toward the bot- 
tom of the stack. 

□ Cache Miss. The cache does not contain the instruction. Types of 
cache misses are 

■ Subsegment miss. The segment address register matches the in- 
struction address, but the relevant P flag is not set. The following 
actons occur: 

■ The instruction word is read from memory and copied into the 
cache. 

■ The number of the segment containing the word is removed from 
the LRU stack and pushed to the top of the LRU stack (if not al- 
ready at the top), thus moving the other segment numbers to- 
ward the bottom of the stack. 

■ The relevant P flag is set. 
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■ Segment miss. None of the segment addresses matches the in- 
struction address. The following actions occur: 

■ The least recently used segment is selected for replacement. 
The P flags for all 32 words are cleared. 

■ The SSA register for the selected segment is loaded with the 27 
MSBs of the address of the requested instruction word. 

■ The instruction word is fetched and copied into the cache. It goes 
into the appropriate word of the least recently used segment. The 
P flag for that word is set to 1 . 

■ The number of the segment containing the instruction word is re- 
moved from the LRU stack and pushed to the top of the LRU 
stack, thus moving the other segment numbers toward the bot- 
tom of the stack. 

3.5.2 Cache and System Memory 

Only instructions may be fetched from the program cache. All reads and 
writes of data in memory bypass the cache. Program fetches from internal 
memory do not modify the cache and do not generate cache hits or misses. 
The program cache is a single-access memory block. Dummy program 
fetches (i.e., following a branch) can generate cache misses and cache up- 
dates. 

Avoid using self-modifying code. If an instruction resides in cache and 
the corresponding location in primary memory is modified, the copy of the 
instruction in cache is not modified. 

Cache can be used more efficiently by aligning program code on 32-word 
address boundaries. Do this by using the ALIGN directive when coding as- 
sembly language. 
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3.5.3 Cache Control Bits 

Four cache control bits are located in the CPU status register: the cache 
clear bit (CC), the cache enable bit (CE), the cache freeze bit (CF), and the 
previous cache freeze bit (PCF) as shown in Figure 3-3 on page 3-6. The 
definitions of these bits are repeated below from Table 3-2. 

Cache Clear Bit (CC). Set CC = 1 to invalidate all entries in the cache (con- 
tents not guaranteed, "garbage"). This bit is always cleared after it is 
written to; thus, it is always read as 0. At reset, 0 is written to this bit. 
The cache P flag = 0 when cache is cleared. 

Cache Enable Bit (CE). Set CE = 1 to enable the cache, allowing the cache 
to be used according to the LRU (least recently used) cache algo- 
rithm. Set CE = 0 to disable the cache; this prevents cache updates or 
modifications (thus no cache fetches can be made). At reset, 0 is writ- 
ten to this bit. Cache clearing (CC = 1) is allowed when CE=0. 

Cache Freeze Bit (CF). Set CF = 1 to freeze the cache (cannot be written to) 
including freezing of LRU (least recently used) stack manipulation. If 
the cache is enabled (CE = 1), fetches from the cache are allowed, 
but modification of the cache contents is not allowed. Cache clearing 
(CC=1 ) is allowed. . At reset, this bit is set to zero. When CF=0, cache 
clearing (CC=1 ) is allowed. CF is set to one when a trap or interrupt is 
taken. Also, the RETI and RETID instructions copy PCF to the CF bit. 

Table 3-9 defines the effect of the CE and CF bits used in combina- 
tion. 

Table 3-9. Combined Effect of the CE and CF Bits 



CE 


CF 


Effect 


0 


0 


Cache not enabled 


0 


1 


Cache not enabled 


1 


0 


Cache enabled and not frozen 


1 


1 


Cache enabled and frozen 



Previous Cache Freeze Bit (PCF). When an interrupt or trap vector is tak- 
en, the CF value is copied to the PCF bit and the CF bit is set to 1 . This 
protects the cache during interrupt processing and is particularly use- 
ful when code loops are interrupted. The interrupt service routine may 
optionally use the cache under software control. Interrupts may also 
be nested, providing that the status register is saved prior to enabling 
the interrupts. When the instructions RETIcondand RETIconcO are 
executed to complete interrupt processing, the contents of the PCF 
bit are copied to the CF bit. 
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Data Formats and Floating-Point 




In the TMS320C40 architecture, data is organized into three fundamental 
types: integer, unsigned-integer, and floating-point. Note that the terms, in- 
teger and signed-integer, are considered to be equivalent. The TMS320C40 
supports short and single-precision formats for signed and unsigned inte- 
gers. It also supports short, single-precision and extended-precision for- 
mats for floating-point data. 

Floating-point operations make fast, trouble-free, accurate, and precise 
computations. Specifically, the TMS320C40 implementation of floating- 
point arithmetic facilitates floating-point operations at integer speeds while 
preventing problems with overflow, operand alignment, and other burden- 
some tasks common in integer operations. 

This chapter discusses in detail the data formats and floating-point opera- 
tions supported on the TMS320C40. Major topics in this section are as fol- 



lows: 

Section Page 

4.1 Signed Integer Formats 4-3 

■ Short Integer Format 4-3 

■ Single-Precision Integer Format 4-3 

4.2 Unsigned-Integer Formats 4-4 

■ Short Unsigned-Integer Format ) 4-4 

■ Single-Precision Unsigned-Integer Format 4-4 

4.3 Floating-Point Formats 4-5 

■ Short Floating-Point Format 4-6 

■ Single-Precision Floating-Point Format 4-7 
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Section Page 

■ Extended-Precision Floating-Point Format 4-8 

■ Conversion Between Floating-Point Formats 4-9 

4.4 Floating-Point Conversions, IEEE/'C4x 4-11 

■ Converting IEEE Format to Twos Complement 
Floating-Point Format 4-12 

■ Converting Twos Complement Floating-Point 

Format to IEEE Format 4-13 

4.5 Floating-Point Multiplication 4-15 

4.6 Floating-Point Addition and Subtraction 4-20 

4.7 Normalization, (NORM Instruction) 4-24 

4.8 Rounding, (RND Instruction) 4-26 

4.9 Floating-Point to Integer Conversions, 

FIX Instruction 4-28 

4.1 0 Integer to Floating-Point Conversion, 

FLOAT Instruction 4-30 

4.11 Reciprocal of Number, RCPF Instruction 4-31 

4.12 Reciprocal of Square Root, RSQRF Instruction 4-33 
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Signed Integer Formats 



4.1 Signed Integer Formats 

The TMS320C40 supports two integer formats: a 1 6-bit short integer format 
and a 32-bit single-precision integer format. When extended-precision 
registers are used as integer operands, only bits 31 - 0 are used; bits 39 -32 
remain unchanged and unused. 



4.1 .1 Short Integer Format 



The short integer format is a 1 6-bit twos-complement integer format used 
for immediate integer operands. For those instructions that assume integer 
operands, this format is sign extended to 32 bits (see Figure 4-1). The 
range of an integer si, represented in the short integer format, is: 
-2 15 <S/<2 1 5-1 

In Figure 4-1 and other figures in this chapter, s = sign bit. 



Figure 4-1. Short Integer Format and Sign Extension of Short Integer 

15 0 

s 

(a) Short Integer Format 
31 . . 16 15 0 

ssssssssssssssss 

(b) Sign Extension of a Short Integer 



4.1.2 Single-Precision Integer Format 

In the single-precision integer format, the integer is represented in 
twos-complement notation. The range of an integer sp, represented in the 
single-precision integer format, is- 2 31 < sp< 2 31 -1 . Figure 4-2 shows the 
single-precision integer format. 



Figure 4-2. Single-Precision Integer Format 

31 
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4.2 Unsigned-Integer Formats 

Two unsigned-integer formats are supported on the TMS320C40: a 16-bit 
short format and a 32-bit single-precision format. In extended-precision reg- 
isters, the unsigned-integer operands use only bits 31- 0; bits 39 - 32 re- 
main unchanged. 

4.2.1 Short Unsigned-Integer Format 

Figure 4-3 shows thel 6-bit, short, unsigned-integer format used for imme- 
diate unsigned-integer operands. For those instructions that assume 
unsigned-integer operands, this format is zero filled to 32 bits. In Figure 4-3 
below, x= MSB (1 or 0). 

Figure 4-3. Short Unsigned-Integer Format and Zero Fill 

15 o 
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(a) Short Unsigned-Integer Format 
1615 0 



0000000000000000 



(b) Zero Fill of a Short Unsigned Integer 



4.2.2 Single-Precision Unsigned-Integer Format 

In the single-precision unsigned-integer format, the number is represented 
as a 32-bit value, as shown in Figure 4-4. 

Figure 4-4. Single-Precision Unsigned-Integer Format 
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4.3 Floating-Point Formats 



All TMS320C40 floating-point formats consist of three fields: an exponent 
field (e), a single-bit sign field (s), and a fraction field (f). These are 
stored as shown in Figure 4-5. The exponent field is a twos-complement 
number. The sign field and fraction field may be considered as one unit and 
referred to as the mantissa field (man). The mantissa is used to represent 
a normalized twos-complement number. In a normalized representation, a 
most significant nonsign bit is implied, thus providing an additional bit of pre- 
cision. The value of a floating-point number x as a function of the fields e, 
s, and f is given as 

x=01./x2e if s = 0 
x= 10.fx2e if s = 1 

x = 0 if e = most negative twos-complement 

value or the specified exponent field width 



Figure 4-5. Generic Floating-Point Format 



man (mantissa) • 



Note: e = exponent field 

s = single-bit sign field 
/ = fraction field 



Three floating-point formats are supported on the TMS320C40: 

□ a short floating-point format (for immediate floating-point operands) 
consisting of a 4-bit exponent, 1 sign bit, and an 11 -bit fraction, 

□ a single-precision format consisting of an 8-bit exponent, 1 sign bit, and 
a 23-bit fraction, and 

□ an extended-precision format consisting of an 8-bit exponent, 1 sign bit, 
and a 31 -bit fraction. 
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4.3.1 Short Floating-Point Format 

In the short floating-point format, floating-point numbers are represented by 
a twos-complement 4-bit exponent field (e) and a twos-complement 12-bit 
mantissa field (man) with an implied most significant nonsign bit. 

Figure 4-6. Short Floating-Point Format 



15 12 


11 | 10 




0 


e 


s 


f 






L 


— man 


— J 



Operations are performed with an implied binary point between bits 11 and 
10. When the implied most significant nonsign bit is made explicit, it is lo- 
cated to the immediate left of the binary point. The floating-point twos-com- 
plement number x in the short floating-point format is given by 

x=01.fx2e if s = 0 
x=10.fx2© ifs=1 
x= 0 if e = -8, s = 0, f = 0 

You must use the following reserved values to represent zero in the short 
floating-point format: 

e = -8 
s = 0 
f = 0 

The following examples illustrate the range and precision of the short float- 
ing-point format: 

Most Positive: x = (2-2 -11) x 2? = 2.5594 x 10 2 

Least Positive: x = 1 x2 ~ 7 = 7.8125 x10 ~ 3 

Least Negative: x=(-1- 2 x2 ~ 7 = -7.8163 x10 ~ 3 

Most Negative: x = -2 x2 7 = - 2.5600x10 2 
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4.3.2 Single-Precision Floating-Point Format 

In the single-precision format, the floating-point number is represented by 
an 8-bit exponent field (e) and a twos-complement 24-bit mantissa field 
(man) with an implied most significant nonsign bit. 

Operations are performed with an implied binary point between bits 23 and 
22. When the implied most significant nonsign bit is made explicit, it is lo- 
cated to the immediate left of the binary point. The floating-point number x 
is given by 

x=01.fx2® if s = 0 
x= 10./x2© if s= 1 
x= 0 if e = -128, s = 0, f = 0 

Figure 4-7. Single-Precision Floating-Point Format 



31 24 


23 | 22 
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f 






< 


— man 


— J 



You must use the following reserved values to represent zero in the single- 
precision floating-point format: 

e = -128 

s = 0 

/=0 

The following examples illustrate the range and precision of the single-pre- 
cision floating-point format. 

Most Positive: x = (2-2 -23) x 2"l 27 = 3.4028234 x1 038 

Least Positive: x = 1 x2 ~ 1 27 = 5.877471 7 x 1 0 - 39 

Least Negative: x = (-1 -2 - 23 ) x2 -1 27 =_ 5.8774724 x1 CT 39 

Most Negative: x = -2x2t 27 =- 3.4028236x10 38 
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4.3.3 Extended-Precision Floating-Point Format 

In the extended-precision format, the floating-point number is represented 
by an 8-bit exponent field (e) and a 32-bit mantissa field {man) with an im- 
plied most significant nonsign bit. 

Operations are performed with an implied binary point between bits 31 and 
30. When the implied most significant nonsign bit is made explicit, it is lo- 
cated to the immediate left of the binary point. The floating-point number x 
is given by: 

x=01./x2© if s = 0 
x= 10./x2© if s = 1 
x=0 if e = -128, s = 0, f= 0 

Figure 4-8. Extended-Precision Floating-Point Format 



39 32 31 30 



man 



You must use the following reserved values to represent zero in the exten- 
ded-precision floating-point format: 

e = -128 

s = 0 

f=0 

The following examples illustrate the range and precision of the exten- 
ded-precision floating-point format: 

Most Positive: x = (2 - 2-31 ) x 2127 = 3.4028236683 x1038 

Least Positive: x = 1 x 2 ~ 1 27 = 5.877471 7541 x 1 0 -39 

Least Negative: x = (-1-2 -31) X 2 -127 = _ 5.8774717569x10-39 

Most Negative: x = -2x2 127 = - 3.4028236691 x 1 0 3 8 
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4.3.4 Conversion Between Floating-Point Formats 

Floating-point operations assume several different formats for inputs and 
outputs. These formats often require conversion from one floating-point for- 
mat to another (e.g., short floating-point format to extended-precision float- 
ing-point format). Format conversions occur automatically in hardware, with 
no overhead, as a part of the floating-point operations. Examples of the four 
conversions are shown below. When a floating-point format zero is con- 
verted to a greater precision format, it is always converted to a valid repre- 
sentation of zero in that format. In the figures below, s = sign bit of the expo- 
nent. 

□ Short floating-point format conversion to single-precision 
floating-point format. 



15 



1211 10 



s x x x y y 



31 



(a) Short Floating-Point Format 
27 24 23 22 12 11 



sssssxxx 


y 


y y 


0 0 



(b) Single-Precision Floating-Point Format 

In this format, the exponent field is sign extended and the fraction field 
filled with zeros. 

□ Short floating-point format conversion to extended-precision 
floating-point format. 



15 



1211 10 



s x x x y y 



39 



35 



(a) Short Floating-Point Format 
32 31 30 20 19 



ssssxxxx 



(b) Extended-Precision Floating-Point Format 

The exponent field in this format is sign extended and the fraction field 
filled with zeros. 
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Single-precision floating-point format conversion to extended 
precision floating-point format. 



31 



24 23 22 



(a) Single-Precision Floating-Point Format 



39 



32 31 30 



8 7 



(b) Extended-Precision Floating-Point Format 
The fraction field is filled with zeros. 

□ Extended-precision floating-point format conversion to single- 
precision floating-point format. 



39 



32 31 30 



8 7 



x y y 



y z 



31 



(a) Extended-Precision Floating-Point Format 
24 23 22 



(b) Single-Precision Floating-Point Format 

The fraction field is truncated. 
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4.4 Floating-Point Conversions (IEEE Std. 754/'C4x) 

Figure 4-9. IEEE Single-Precision Std. 754 Floating-Point Format 

31 30 23^2 o 



Figure 4-10. 



man 



This IEEE format is depicted in Figure 4-9 above. The following five cases 
define the value vol a number expressed in this format: 



1) If 


e 


= 255 


and 




o, 


then 


v= NaNt 


2) If 


e 


= 255 


and 


f = 


o, 


then 


v= (-1) s infinite 


3) If 


0< 


e<255, 








then 


V=(-1)s* 2^-127(1./) 


4) If 


e 


= 0 


and 


f * 


0, 


then 


V=(-1)Sx 2-126(0./) 


5) If 


e 


= 0 


and 


f = 


o, 


then 


v= (-1)sQ (zero). 



where s = sign bit; e = the exponent field; f = the fraction field. 

For the above five representations, e is treated as an unsigned integer. Case 
1 generates NaN (not an number) and is primarily used for software signal- 
ing. Case 4 represents a denormalized number. Case 5 represents positive 
and negative zero. 

TMS320C4x Single-Precision Twos-Complement Floating-Point 
Format t 



31 



24 23 22 



In comparison, Figure 4-1 0 shows the the 'C40 twos-complement floating- 
point format. In this format, two cases can be used to define value v of a 
number: 

1) If e = -128 and f * 0, then v = 0 

2) If e * -128 then v = 2e(ss./) 
where s = sign bit; e = the exponent field; f = the fraction field. 



t NaN = not a number 

$ Same format as for the TMS320C3x 
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For this representation, e is treated as a twos-complement integer. The frac- 
tion and sign bit form a normalized twos-complement mantissa. 

i 1 1 

Note: Symbols to Differentiate Between IEEE and 'C40 Formats 

In order to differentiate between the symbols used to define these two for- 
mats, all IEEE fields are subscripted with an IEEE (e.g., sieee* $IEEE> et c)- 
Similarly, all twos-complement fields are subscripted with a two (i.e., etwo> 
£two> Wo)- 

i i , ; , 1 

4.4.1 Converting IEEE Format to Twos-Complement 
Floating-Point Format 

The most common conversion is the IEEE to twos-complement format. This 
conversion is done according to rules in the following table: 

Table 4-1. Rules for Converting IEEE Format to Twos-Complement Floating-Point Format 





If These Values Are Present 


Then These Values Equal 


Case 


e IEEE 


S IEEE 


f IEEE 


°two 


Stwo 


Wo 


S/EEE 


1 


255 


1 




7Fh 


1 


00 OOOOh 




2 


255 


0 




7Fh 


0 


7F FFFFh 




3 


0< efEEE < 255 


0 




e/EE£- 7Fh 




f IEEE 


0 


4 


0< e/£££ <255 


1 




e/EEE-?Fh 






1 


5 


0<«te?<255 


1 


lillillii 


0/£££~8Oh 




0 


1 


6 


0 


80h 


0 


00 OOOOh 





flEEE = ones complement of i/EEE- 

Case 1 maps the IEEE positive NaNs and positive infinity to the single-preci- 
sion twos-complement most positive number. Overflow is also signaled to 
allow you to check for these special cases. 

Case 2 maps the IEEE negative NaNs and negative infinity to the single- 
precision twos-complement most negative number. Overflow is also sig- 
naled to allow you to check for these special cases. 

Case 3 maps the IEEE positive normalized numbers to the identical value 
in the twos-complement positive number. 

Case 4 maps the IEEE negative normalized numbers with a nonzero frac- 
tion to the identical value in the twos-complement negative number. 

Case 5 maps the IEEE negative normalized numbers with a zero fraction 
to the identical value in the twos-complement negative number. 
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Floating-Point Conversions (IEEE Std. 754/'C4x) 

Case 6 maps the IEEE positive and negative denormalized numbers and 
positive and negative zeroes to a twos-complement zero. 

The TMS320C40 assumes that the IEEE numbers are stored as an integer 
in memory or in a register. When converted, they are always placed in an 
extended-precision register by using the exponent and fraction fields of 
these registers. Any arithmetic operations that are performed on the fraction 
field of the IEEE number should be performed only on the IEEE fraction 
field. The eight LSBs of the extended-precision register are set to zero. 

4.4.2 Converting Twos-Complement Floating-Point Format 
to IEEE Format 

This conversion is done according to rules in the following table: 



Table 4-2. Rules for Converting Twos-Complement Floating Point Format to IEEE Format 





If These Values Are Present 


Then These Values Equal 


Case 


e two 




Wo 


eiEEE 


S IEEE 


f IEEE 


1 




00h 


0 


00 OOOOh 


2 


-127 


OOh 


0 


00 OOOOh 


3 


-126<eto, 0 <127 


0 




%, 0 +7Fh 


0 


Wo 


4 


-126< etwo ^ 127 


1 






0 


W 0 +1t 


5 


-126< %, 0 £127 


llllllllll 


lllllll 


6^ 0 +7Eh 


1 


00 OOOOh 


6 


127 


1 


0 


FFh 


1 


00 OOOOh 



t ffwQ = ones complement of ff WOt 



Case 1 maps a twos-complement zero to a positive IEEE zero. 

Case 2 maps the twos-complement numbers that are too small to be repre- 
sented as normalized IEEE numbers to a positive IEEE zero. 

Case 3 maps the positive twos-complement numbers that are not covered 
by case 2 into the identically valued IEEE number. 
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Case 4 maps the negative twos-complement numbers with a nonzero frac- 
tion that are not covered in case 2 into the identically valued IEEE number. 

Case 5 maps all the negative twos-complement numbers with a zero frac- 
tion, except for the most negative twos-complement number and those that 
are not covered in case 2, into the identically valued IEEE number. 

Case 6 maps the most negative twos-complement number to the IEEE neg- 
ative infinity. 

TheTMS320C4x assumes that the twos-complement numbers are in 
memory or are in an extended-precision register using the exponent and 
fraction field of the register (shown in Figure 4-1 0 on page 4-11). If the val- 
ue is in an extended-precision register, then only the 24 MSBs of the fraction 
field are manipulated as the fraction field and for detection of the special 
cases. The result of the conversion goes to a register as an integer. 
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4.5 Floating-Point Multiplication 

A floating-point number a can be written in floating-point format as in the fol- 
lowing formula, where a(man) is the mantissa and a(exp) is the exponent. 

a = a(man) x 2<*( ex P) 

The product of a and b is c, defined as 

c = a x b = a(man) x b(man) x 2(«(^P) +b ( ex P)) 

c(man) = a(man) x b(man) 

c(exp) = a(exp) + b(exp) 

During floating-point multiplication, source operands are always assumed 
to be in the extended-precision floating-point format: 

If the source of the operands is in short floating-point format, it is ex- 
tended Xo the extended-precision floating-point format. 

If the source of the operands is in single-precision floating-point for- 
mat, it is extended to extended-precision format. 

These conversions occur automatically in hardware with no overhead. All 
results of floating-point multiplications are in the extended-precision format. 
These multiplications occur in a single cycle. 



4-15 



Floating-Point Multiplication 



Figure 4-11. Flowchart for Floating-Point Multiplication 



g(man) 



b(man) 



<x(exp) 



b(exp) 
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Test for special cases of c(man) 
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(4) (5) 


(6) 


c(man) = 0 


Right- shift 1 Right- shift2 


No shift 




to normalize to normalize 
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JZl 



c(exp) a 
-128 



J8) 



c(man) > > 1 
and c(exp) = 
c(exp) + 1 



J9) 



c(man) > > 2 
and c(exp) = 
c(exp) + 2 



Dispose of extra bits 



Put c(man) in extended- 
precision floating-point 
format 



(10) 



Test for special cases of o(exp) 



(11) 

c{exp) overflow 



(12) 

c(exp) underflow 



(13) 
c(exp) in range 



il4) 



If c(man) > 0, 
set c to most 
positive value. 
If c( man) < 0, 
set c to most 
negative value. 



c(exp) = 


= -128 


c(man) 


= 0 



(15) 



Set c to final result 

J 

c = axb 



(16) 
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Figure 4-11 is a flowchart showing floating-point multiplication: 

1) In step 1 (steps are shown as numbers in parentheses), the 32-bit 
source operand mantissas are multiplied, producing a 64-bit result 
c(man). (Note that input and output data are always represented as nor- 
malized numbers.) 

2) In step 2, the exponents are added, yielding c(exp). 

3) Steps 3 through 6 check for special cases. 

4) Step 3 checks for whether c(man) in extended-precision format is equal 
to zero. If c(man) is zero, step 7 sets c(exp) to -1 28, thus yielding the 
representation for zero. 

5) Steps 4 and 5 normalize the result. 

6) If a right shift of one is necessary, then in step 8, c(man) is right-shifted 
one bit, and one is added to c(exp). 

7) If a right shift of two is necessary, then in step 9, c(man) is right-shifted 
two bits, and two is added to c(exp). Step 6 occurs when the result is 
normalized. 

8) In step 1 0, c(man) is set in the extended-precision floating-point format. 

9) Steps 11 through 1 6 check for special cases of c(exp). 

1 0) In step 14, if c(exp) has overflowed (step 11) in the positive direction, 
then c(exp) is set to the most positive extended-precision format value. 
If c(exp) has overflowed in the negative direction, then c(exp) is set to 
the most negative extended-precision format value. 

1 1 ) If c(exp) has underflowed (step 12), then c is set to zero (step 15); i.e., 
c(man) = 0 and c(exp) = -128. 
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The following examples illustrate how floating-point multiplication is per- 
formed on the TMS320C40. For these examples, the implied most signifi- 
cant nonsign bit is made explicit. 

Example 4-1. Floating-Point Multiply (Both Mantissas = -2.0) 
Let 

a = -2.0 x 2<*(e*P) = 1 0.00000000000000000000000 x 2«(^P) 
b = -2.0x2b(e*P) = 10.00000000000000000000000 x2b(exp) 

where a and b are both represented in binary form according to the normalized single-pre- 
cision floating-point format. Then 

10,00000000000000000000000 x 2<*<e*P) 
x 10,00000000000000000000000 x a b (**p) ! 

0100.0000000000000000000000000000000000000000000000 x 2(«W +b(axp)) ! 

To place this number in the proper normalized format, it is necessary to shift the mantissa 
two places to the right and add two to the exponent. This yields 

10,00000000000000000000000 x 2<*(e*P) 
X 10.00000000000000000000000 x 2b(e*P) 

* 01.0000000000000000000000000000000000000000000000 x z{<*{exp)+b{exp)+2) 

In floating-point multiplication, the exponent of the result may overflow. This can occur 
when the exponents are initially added or when the exponent is modified during normaliza- 
tion. 

Example 4-2. Floating-Point Multiply (Both Mantissas = 1.5) 
Let 

oc= 1.5x2<*(e*p) = 01.10000000000000000000000 x2«(^P) 

b = 1.5x2b(exp) = 01.10000000000000000000000 x2 b (**P) 

where a and b are both represented in binary form according to the single-precision float- 
ing-point format. Then 

0110000000000000000000000 x 2<*{**P) 
x OUOOOOOOOOOOOOOOOOOOOOOO x 2b(**P) 

0010.01 00000000000000000000000000000000000000000000 x 2 W^rt +b(exp)) 
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To place this number in the proper normalized format, it is necessary to shift the mantissa 
one place to the right and add one to the exponent. This yields 

Quoooooooooooooooooooooo x s 
, 4 xMJoooooooooobQoooooooooo x #x*m „ ; ^ , , , . ^ 

, mmm ' ' ' mmm i h i i m u mm w mm mmm ' w wmw u nwn u ' i ' ' ■ w iwmwww m mm ■ ' v n n nw w m m w w umiwwyi imihh m 1 ' n.m i n i m i i m i h urn 1 1 m i i l'm unm iii n i i mm wuiiwywA^ 

Example 4-3. Floating-Point Multiply (Both Mantissas = 1.0) 
Let 

a = 1 .0 x 2«(e*P) . 01 .00000000000000000000000 x 2«(«0 
b = 1 .0 x 2 b ( e *P) = 01 .00000000000000000000000 x 2 b ( e *P) 

where a and b are both represented in binary form according to the single-precision float- 
ing-point format. Then 

- 01.00000000000000000000000 x 2*(e*P) 

x 01.00000000000000000000000 x 2b(«a 

00Q1 .0000000000000000000000000000000000000000000000 x 2 

This number is in the proper normalized format. Therefore, no shift of the mantissa or mod- 
ification of the exponent is necessary. 

These examples have shown cases where the product of two normalized numbers can 
be normalized with a shift of zero, one, or two. For all normalized inputs with the float- 
ing-point format used by the TMS320C40, a normalized result can be produced by a shift 
of zero, one, or two. 

Example 4-4. Floating-Point Multiply Between Positive and Negative Numbers 
Let 

a = 1 .0 x 2«( e *P) = 01 .00000000000000000000000 x 2«( e *P) 
b = -2.0 x 2&(e*P) = 10.00000000000000000000000 x 2 b (e*P) 

Then 

.01,00000000000000000000000 x 2«<**P) 
x 10,00000000000000000000000 x gb(exp) _^ 

1 11 0.OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO x 2M*tf +*K*?8 

The result is c = - 2.0 x 2(«(«0 + 
Example 4-5. Floating-Point Multiply by Zero 

All multiplications by a floating-point zero yield a result of zero (/= 0, s= 0, and exp= -1 28). 

4-19 



Floating-Point Addition and Subtraction 



4.6 Floating-Point Addition and Subtraction 

In floating-point addition and subtraction, two floating-point numbers a and 
b can be defined as 

• a = a(man) x 2 < ex P) 

b = b(man) x 2 b(exp) 

The sum (or difference) of a and b can be defined as 

c - oc± b 

- (a(man) ± (b(man) x 2 -(a(exp)-b(exp)))) x 2 «(e*P), 

if a(exp) £ b(exp) 
= ((a(man) x 2 -(b(exp)-<x(exp))) ± b(mari)) x 2 b(exp) f 

if a(exp) < b(exp) 

Figure 4-12 is the flowchart for floating-point addition. Since this flowchart 
assumes signed data, it is also appropriate for floating-point subtraction. In 
this figure, it is assumed that oc(exp) < b(exp). In step 1 (steps are numbers 
in parentheses), the source exponents are compared, and c(exp) is set 
equal to the largest of the two source exponents. In step 2, d is set to the 
difference of the two exponents. In step 3, the mantissa with the smallest 
exponent, in this case a(man), is right shifted d bits in order to align the man- 
tissas. After the mantissas have been aligned, they are added (step 4). 

Steps 5 through 7 check for a special case of c(man). If c{man) is zero (step 
5), then c(exp) is set to its most negative value (step 8) to yield the correct 
representation of zero. If c{man) has overflowed c (step 6), then in step 9, 
c(man) is right shifted one bit, and one is added to c(exp). In step 10, the 
result is normalized. In steps 1 1 and 1 2, special cases of c(exp) are tested. 
If c(exp) has overflowed, then c is set to the most positive extended-preci- 
sion value if it is positive; otherwise, it is set to the most negative extended- 
precision value. 
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Figure 4-12. Flowchart for Floating-Point Addition 
ot(man) b(man) 
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c(exp) = c(exp)-k 



Test for special cases of c(exp) 



(11) 

c(exp) overflow 



(12) 

c(exp) underflow 



If c(man) > 0, 
set c to most 
positive value. 
If c(man) < 0, 
set c to most 
negative value. 



(13) 
c(exp) in range 



set c to zero 
c(exp) =-128 
c(man) = 0 



(15) 



(16) 



Set c to final result 



c = a + b 
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The following examples describe the floating-point addition and subtraction 
operations. It is assumed that the data is in the extended-precision 
floating-point format. 

Example 4-6. Floating-Point Addition 

In the case of two normalized numbers to be summed, let 

a = 1 .5 « 01 .1 000000000000000000000000000000 x 2° 
b = 0.5 = 01 .0000000000000000000000000000000 x 2-1 

It is necessary to shift b to the right by one so that a and b have the same 
exponent. This yields 

b = 0.5 - 00.1000000000000000000000000000000 x 20 

Then 

01 .10000000000000000000000000000000 x 20 
+ OQJOOOOOOO0OO0OOO0OOO0OO000OQQO0OO X 20 

O1O.00O0GGO0OOO0OO00OO0OO0OOOQOOO0OO x 20 

As in the case of multiplication, it is necessary to shift the binary point one 
place to the left and to add one to the exponent. This yields 

* 01JQOO0QO00OOO00OO0OOOOOO0000O0O0O x 20 
* OOJOQO0OOO0OOOOOO0OOO0OOOQOO0OO0OO x 20 

01.00000000000000000000000000000000 x 21 

Example 4-7. Floating-Point Subtraction 

A subtraction is performed in this example. Let 

a = 01 .0000000000000000000000000000001 x 2° 
b = 01 .0000000000000000000000000000000 x 20 

The operation to be performed is a - b. The mantissas are already aligned 
because the two numbers have the same exponent. The result is a large 
cancellation of the upper bits, as shown below. 

O1.00OOO0OOOOOO00OO0OOO0OOQOO0OOO1 x 20 
- 01.0000000000000000000000000000000 x 2 Q 

0O.OQOO0OOO0OO00OO0OOO0OO0OOO0OOO1 x 2° 
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Floating-Point Addition and Subtraction 



The result must be normalized. In this case, a left shift of 31 is required. The 
exponent of the result is modified accordingly. The result is 

01.0000000000000000000000000000001 x 2<> 
- 01.0000000000000000000000000000000 x 2 ° 

01.0000000000000000000000000000000 x 2~31 

Example 4-8. Floating-Point Addition With a 32-Bit Shift 

This example illustrates a situation where a full 32-bit shift is necessary to 
normalize the result. Let 

a = 01.1111111111111111111111111111111 x 2 127 
b = 10.0000000000000000000000000000000 x 2127 

The operation to be performed is a + b. 

01.1111111111111111111111111111111 x 2127 
■» 10.0000000000000000000000000000000 x 2 127 
11.1.111111111111111111111111111111 x 2127 

Normalizing the result requires a left shift of 32 and a subtraction of 32 from 
the exponent. The result is 

01.1111111111111111111111111111111 x 2127 
+ 10X3000000000000000000000000000000 x 2 127 
10.0000000000000000000000000000000 x 295 

Example 4-9. Floating-Point Addition/Subtraction and Zero 

When floating-point addition and subtraction are performed with a float- 
ing-point 0, the following identities are satisfied: 

a ± 0 = a (a * 0) 

0 + 0 = 0 

0 -a = - a (a * 0) 
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4.7 



Normalization (NORM Instruction) 



The NORM instruction normalizes an extended-precision floating-point 
number that is assumed to be unnormalized. Since the number is assumed 
to be unnormalized, no implied most significant nonsign bit is assumed. The 
NORM instruction executes the following three steps: 

1 ) Locates the most significant nonsign bit of the floating-point number. 

2) Left shifts to normalize the number. 

3) Adjusts the exponent. 

Given the extended-precision floating-point value a to be normalized, the 
normalization, norm ( ), is performed as shown in Figure 4-13. 



Figure 4-13. 



Flowchart for NORM Instruction Operation 



a 



i 



Tesl for special cases of c (man) 



<x (pan) « 0 



(2) 

Leading nonsignificant 
sign bits 



k = # leading 
nonsignificant 
^ r sign bits 



(3) 



(4) 



Sign-expended a(man) 1 bit 
c (man) = cn(man) < < k 
c (exp) = a(exp)-k 



Remove most significant nonsign bit 



(5) 




Test for special cases of c (exp) 




c = norm(cx) 
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Normalization (NORM Instruction) 

Example 4-10. NORM Instruction 

Assume that an extended-precision register contains the value 

man m 00000000000000000001 000000000001 , exp * 0 

When the normalization is performed on a number assumed to be unnor- 
malized, the binary point is assumed to be 

man « 0.0000000000000000001 000000000001 , exp = 0 

This number is then sign extended one bit so that the mantissa contains 33 
bits. 

man « 00.0000000000000000001 000000000001 , exp ** 0 

The intermediate result after the most significant nonsign bit is located and 
the shift performed is: 

man = 01 .000000000001 0000000000000000000, exp = -19 

The final 32-bit value output after removing the redundant bit is: 

man ~ 00000000000010000000000000000000, exp = -19 

The NORM instruction is useful for counting the number of leading zeros or 
leading ones in a 32-bit field. If the exponent is initially zero, the absolute 
value of the final value of the exponent is the number of leading ones or 
zeros. This instruction is also useful for manipulating unnormalized float- 
ing-point numbers. 
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4.8 Rounding (RND Instruction) 

The RND instruction rounds a number from the extended-precision float- 
ing-point format to the single-precision floating-point format. Rounding is 
similar to floating-point addition. Given the number a to be rounded, the fol- 
lowing operation is performed first. 

c = a(man) x 2<*(**P) + (1 x 2o.(exp)-2A) 

Next, a conversion from extended-precision floating-point to single-preci- 
sion floating-point format is performed. Given the extended-precision float- 
ing-point value, the rounding, rnd( ), is performed as shown in Figure 4-14. 
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Rounding (RND Instruction) 



Figure 4-14. Flowchart for Floating-Point Rounding by the RND Instruction 



1 X2 



oc(exp)-24 



1 i 



Add <x(man) and 1/2 an LSB 



o (man) - <x (ma«) + 2- 24 



Test far special casas af fitowot 



c (man) * 0 Overflow of c {man) No special case 



c (exp) = -128 



c (man) = c (man) < < 1 
c (exp) = a (exp) + 1 



Test for special cases of c (exp) 



c (exp) overflow 



c (exp) in range 



If c (man) > 0, 
set c to most positive 
single-precision value. 
If c (man) < 0, 
set c to most negative 
single-precision value. 




Set 8 LSBs of c(man) to zero 



c = rnd(a) 
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4.9 Floating-Point-to-lnteger Conversion (FIX Instruction) 

Floating-point to integer conversion, using the FIX instructions, allows ex- 
tended-precision floating-point numbers to be converted to single-precision 
integers in a single cycle. The floating-point to integer conversion of the 
value x is referred to here as fix(x). The conversion does not overflow if a, 
the number to be converted, is in the range 

-231 <oc< 231-1 

First, you must be certain that 

oc(exp) < 30 

If these bounds are not met, an overflow occurs. If an overflow occurs in the 
positive direction, the output is the most positive integer. If an overflow oc- 
curs in the negative direction, the output is the most negative integer. If 
oc(exp) is within the valid range, then a(man), with implied bit included, is 
sign-extended and right-shifted (rs) by the amount 

rs = 31 - oc(exp) 

This right shift (rs) shifts out those bits corresponding to the fractional part 
of the mantissa. For example: 

If 0<x<1,thenfix(x) = 0. 
If — 1 <x<0,thenfix(x)=-1. 

The flowchart for the floating-point to integer conversion is shown in 
Figure 4-15. 
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Floating-Point-to-integer Conversion (FIX Instruction) 



Figure 4-15. Flowchart for Floating-PoinMo-lnteger Conversion by FIX Instructions 

, A , 

Test for special cases of q(exp) 



a(exp) > 30 



a(^xp) in range 
rs w 31 - ot(exp) 



Overfiow 

If a(man) > 0, 

c = most positive integer. 
If a(man) < 0, 

c = most negative integer. 



Shift 


c = a(ma 


n) > > rs 



t 

Set c to final result 



c = fix(a) 



4-29 



Integer-to-Floating-Point Conversion Using the Float Instruction 



4.10 Integer-to-Floating-Point Conversion (FLOAT Instruction) 

Integer to floating-point conversion, using the FLOAT instruction, allows 
single-precision integers to be converted to extended-precision float- 
ing-point numbers. The flowchart for this conversion is shown in 
Figure 4-1 6. 

Figure 4-16. Flowchart for Integer-to-Floating-Point Conversion by FLOAT Instructions 

c(exp}*30 





leading nonsignificant 
o ffti&n) - 0 sign bits. 


i 


k m # leading 
non-significant 
' sign bits 


c(exp) =-128 


c(man) = c{man) <<k 
c(exp) = 30-k 








\ 




Remove most significant nonsign bit 


f \ 




Set c to final rasuft 



c = float (oc) 
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4.11 Reciprocal (RCPF Instruction) 

The RCPF instruction generates a satisfactory estimate of the reciprocal of 
a floating-point number. The estimate has the correct exponent, and the 
mantissa is often accurate to the eighth binary position (mantissa error is 
thus < 2~ 8 ). Also, this estimate may be used as a seed for an algorithm to 
compute the reciprocal to even greater accuracy. (The Newton-Raphson 
algorithm, described in this section, is one such case.) 

Figure 4-17 below depicts the algorithm used by instruction RCPF. 

□ The input is assumed to be v = vman x ^exp. 

□ The output is assumed to be x = xman x 2* e *P. 

□ vexp is negated. 

□ If vexp= -1 28, the result is saturated to the most positive number, and 
the overflow flag is set. The N condition flag is set to the same sign as 
vsign. 



4-17. RCPF Instruction Algorithm 
vexp 



vsign vfrac(22 . . 15) 



Negate vexp 



i 



Look-up Table 
(512x8) 



xfrac(22.. 15) 



Form xm&tt. 
xfrac(14..Q)= 0 
xsign* vsign 



xexp xman 



If vexp m -128, set overflow flag and 
saturate to most positive number. 



T 
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The look-up table is addressed by forming a nine-bit address consisting of 
vsign and bits 22-15 of vfrac. The eight-bit output of the lookup table is 
forms bits 22-1 5 of xfrac. Bits 1 4-0 of xf rac are set to zero, xsign is set to 
vsign. 

The lookup-table values are generated from simulation results. 



The RCPF instruction provides the reciprocal of a number. The estimate has 
the correct exponent and a mantissa accurate to the eighth binary place 
(i.e., the error of the mantissa is < 2~ 8 ). The Newton-Raphson algorithm 
(shown below) may be used to further extend the mantissa's precision: 



x[0], the seed for the algorithm, is given by RCPF. For each iteration of the 
algorithm, the number of accurate bits in the mantissa doubles. Using 
RCPF, you can start with an estimate accurate to eight bits. With one itera- 
tion, accuracy is 1 6 bits in the mantissa, and with a second iteration, accura- 
cy is 32 bits. 

The TMS320C4x program to implement this algorithm is shown in 
Figure 4-18. Each step of the algorithm is labeled along with the corre- 
sponding accuracy achieved at the end of the step. The algorithm takes only 
seven machine cycles. 



Figure 4>-18. Newton-Raphson Algorithm for Computing the Reciprocal 



4.11.1 Reciprocal Algorithm 



x[n+l] = x[n] (2 - vx[n] ) 

where v= the number whose reciprocal is to be found. 




2 * 0 ^ R2 
3?t2 f &1 




; ^tidof first iteration (16-bit accuracy) 



StlB&f 
IMPYF ! 



i s&dof second iteration <32~bit accuracy) 




; El - 1/v 



i 



4-32 



Data Formats and Floating-Point Operation 



Reciprocal Square Root (RSQRF Instruction) 



4.12 Reciprocal Square Root (RSQRF Instruction) 



The RSQRF instruction generates an estimated reciprocal of the square 
root of a floating-point number. It parallels some of the operational charac- 
teristics of the RCPF instruction (Section 4.11) in that the RSQRF: 

□ it generates an estimate (in this case the reciprocal of the square root 
of a floating-point number), 

□ the mantissa is accurate to the eighth binary place (mantissa error is 
< 2-8), and 

□ often, this is a satisfactory estimate of the reciprocal of a number's 
square root; in other cases, it may be used as a seed for an algorithm 
that computes the reciprocal square root to an even greater accuracy. 

Figure 4-19 depicts the RSQRF algorithm. 

□ The input is assumed to be v = vman x 2 vex P- 

□ The output is assumed to be x = xman x 2 xex P- 

□ vexp + 1 is negated and shifted right one bit with sign extension. 

□ If vexp= -1 28, the result is saturated to the most positive number, and 
the overflow flag is set. 



Figure 4-19. RSQRF Instruction Algorithm 



vexp 



vexp(O) vfrac(22 . . 15) 



t-frexp + l) shifted 
right one bit and 
sign extended 




xfrac(22.. 15) 



Form xman. 
xfrac(14 ..OJ=0 
xsign * 0 



xexp 



xman 



if vexp « -1 28, set overflow flag and 
saturate to most positive number. 



x 
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The look-up table is addressed by forming a nine-bit address consisting of 
the least significant bit of vexpand bits 22 - 1 5 of vfrac. The eight-bit output 
of the look-up table is used to form bits 22 - 1 5 of xf rac. Bits 1 4 - 0 of xfrac 
are set to zero, xsign is set to 0. There is no provision for negative values 
of v. 

The look-up-table values are generated from simulation results. 

Of course, given the result of this algorithm, division is performed by a sim- 
ple multiplication: y/v = y^n] where x[a7] is the estimate of 1/i/as determined 
by the Newton-Raphson algorithm or an other algorithm. 

4.12.1 Reciprocal Square Root Algorithm 

The RSQRF instruction provides the reciprocal of the square root of a num- 
ber. The estimate has the correct exponent and a mantissa accurate to the 
eighth binary place (i.e., the error of the mantissa is < 2-8). The Newton- 
Raphson algorithm (shown below) may be used to further extend the man- 
tissa's precision: 

x[n+l] = x[n] (1.5-(v/2) x[ji]x[ji]) 

where v - the number whose reciprocal is to be found. 

The seed for the algorithm, x[0], is given by RSQRF. For each iteration of 
the algorithm, the number of accurate bits in the mantissa doubles. Using 
RSQRF, you can start with an estimate having an accuracy to eight bits. 
With one iteration, accuracy is 16 bits in the mantissa, and with a second 
iteration, accuracy is 32 bits. 



Figure 4-20. 


Newton-Raphson Algorithm for Computing the Reciprocal Square Root 


RCPF 


R0,RX 


/ RO *= v, RX « xfO] 


MPYF 




; RO v/2 


llllilllllllililll 
MPYF 


RX f RX f R2 




MPYF i 


R0,R2 




SUBRF 


X.5,R2 




MPYF 


R2 1 RX 


; end of first iteration (X6-bit accuracy) 


MPYF 


R1,R1,R2 




MPYF 


R0,R2 




SOBRF 


X » f5 / R2 




MPYF 


R2 f RX 


; end of second iteration (32-bit accuracy) 






; RX « 11 (v**0,5) 
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The TMS320C4x program to implement this algorithm is shown in 
Figure 4-20. Each step of the algorithm is labeled, and the corresponding 
accuracy achieved is noted at the end of the step. The algorithm takes only 
ten machine cycles (compared to 30 cycles on the 'C3x without a look-up 
table). 

4.12.2 Background on the Reciprocal Square Root 

In many applications, normalization of data values is necessary. Often, the 
normalizing factor is the square root of another quantity. For example, when 
one vector is given, the unit vector in the same direction as the original vec- 
tor can be found by normalizing the original vector by the length of the vector. 
This involves division by a square root. The '040 provides a simple way to 
directly determine this quantity, instead of going through a two-step ap- 
proach of finding the square root and then finding the reciprocal of the 
square root. 

Of course, given the result of this algorithm, the square root is found by a 
simple multiplication: 

V = l/*[A7] 

where x[n] is the estimate of as determined by the Newton- 

Raphson algorithm or some other algorithm. 
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Chapter 5 



Addressing 




The TMS320C40 supports five groups of powerful addressing modes. Six 
types of addressing may be used within the groups, whioh facilitates access 
of data from memory, registers, and the instruction word. This chapter de- 
tails the operation, encoding, and implementation of the addressing modes. 
It also discusses the management of system stacks, queues, and deques 



in memory. The major topics in this chapter: 

Section Page 

5.1 Types of Addressing 5-2 

■ Register 5-3 

■ Direct 5-4 

■ Indirect 5-5 

■ Immediate 5-17 

■ PC-Relative 5-17 

5.2 Groups of Addressing Modes 5-19 

■ General Addressing Modes 5-19 

■ Three-Operand Addressing Modes 5-20 

■ Parallel Addressing Modes 5-23 

■ Conditional-Branch Addressing Modes 5-24 

5.3 Circular Addressing 5-25 

5.4 Bit-Reversed Addressing 5-30 

5.5 System Stack Management 5-31 
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5.1 Types of Addressing 

Five types of addressing allow access of data from memory, registers, and 
the instruction word: 

Sub- 



sectipn Page 

□ Register addressing 5.1.1 5-3 

□ Direct addressing 5.1.2 5-4 

□ Indirect addressing 5.1.3 5-5 

□ Immediate addressing 5.1.4 5-17 

□ PC Relative addressing 5.1 .5 5-1 7 



Some types of addressing are appropriate for some instructions and not 
others. For this reason, the types of addressing are used in the four different 
groups of addressing modes as follows: 

Sub- 

septipn Eag§. 

□ General addressing modes (G): 5.2.1 5-19 

■ Register 

■ Direct 

■ Indirect 

■ Immediate 

□ Three-operand addressing modes (T): 5.2.2 5-20 

■ Register 

■ Immediate 

■ Indirect 

□ Parallel addressing modes (P): 5.2.3 5-23 

■ Register 

■ Indirect 

□ Conditional-branch addressing modes (B): 5.2.4 5-24 

■ Register 

■ PC-relative 

The six types of addressing are discussed first (subsections 5.1 .1 through 
5.1 .5, beginning on the next page), followed by the five groups of addressing 
modes (section 5.2, beginning on page 5-19). 
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5.1 .1 Register Addressing 

In register addressing, a CPU register contains the operand, as shown in 
this example: 

ABSF Rl ; Rl - | Rl | 

The syntax for the CPU registers, the assembler syntax, and the assigned 
function for those registers are listed in Table 5-1 . 

Table 5-/. CPU Register/Assembler Syntax and Function 



Register Machine 


Assembler 


Assigned 


Value 


Syntax 


Function 


OOh 


RO 


Extended-precision register 


01 h 


R1 


Extended-precision register 


02h 


R2 


Extended-precision register 


03h 


R3 


Extended-precision register 


04h 


R4 


Extended-precision register 


nek 


DC 


Extended-precision register 


06h 


R6 


Extended-precision register 


07h 


R7 


Extended-precision register 


1Ch 


R8 


Extended-orecision reaister 


1Dh 


R9 


Extended-precision register 


1Eh 


R10 


Extended-precision register 


1Fh 


R11 


Extended-precision register 


08h 


ARO 


Auxiliary register 0 


09h 


AR1 


Auxiliary register 1 


OAh 


AR2 


Auxiliary register 2 


OBh 


AR3 


Auxiliary register 3 


OCh 


AR4 


Auxiliary register 4 


ODh 


AR5 


Auxiliary register 5 


OEh 


AR6 


Auxiliary register 6 


OFH 


AR7 


Auxiliary register 7 


10h 


DP 


Data-page pointer 


11h 


IRO 


Index register 0 


12h 


IR1 


Index register 1 


13h 


BK 


Block-size register 


14h 


SP 


Active stack pointer 


15h 


ST 


Status register 


16h 


DIE 


DMA coprocessor interrupt enable 


17h 


HE 


Internal interrupt enable register 


18h 


IIF 


IIOF pins and interrupt flag register 


19h 


RS 


Repeat start address 


1Ah 


RE 


Repeat end address 


1Bh 


RC 


Repeat counter 
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5.1 .2 Direct Addressing 



In direct addressing, the data address is formed by the concatenation of the 
1 6 least significant bits of the data page pointer (DP) with the 1 6 least signifi- 
cant bits of the instruction word (expr). This results in 65536 pages (64K 
words per page), giving you a large address space without requiring a 
change of the page pointer. The syntax and operation for direct addressing 
are listed below. 

Syntax: @expr 

Operation: address = DP concatenated with expr 

Figure 5-1 shows the formation of the data address. Example 5-1 gives an 
instruction example with data before and after instruction execution. 



Figure 5-1. Direct Addressing 



dp — ► 

(Data 
Page 
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31 


16 
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0 




Instruction , 


















Word 











expr 




31 
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15 
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page 
















31 




15 
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address 








31 




I 






0 
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Example 5-1. Direct Addressing 

ADD I @0BCDEh, R7 

Before Instruction: After Instruction: 

DP = 108Ah DP = 108Ah 

R7 = 11h R7 = 1234 5689h 

Data at 1 08A BCDEh - 1 234 5678h Data at 1 08A BCDEh = 1 234 5678h 
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Types of Addressing — Indirect 

5.1 .3 Indirect Addressing 

Indirect addressing is used to specify the address of an operand in memory 
through the contents of an auxiliary register, optional displacements, and in- 
dex registers. This arithmetic is performed by the auxiliary register arithme- 
tic units (ARAUs) and is unsigned. (All 32 bits of the auxiliary and index reg- 
isters are used in indirect addressing.) 

The flexibility of indirect addressing is possible because the ARAUs on the 
TMS320C40 are used to modify auxiliary registers in parallel with opera- 
tions within the main CPU. Indirect addressing is specified by a five-bit field 
in the instruction word, referred to as the mod field (in the left side of 
Table 5-2 on page 5-6 as well as in the examples that follow). A displace- 
ment is either an explicit unsigned 5-bit or 8-bit integer contained in the in- 
struction word or an implicit displacement of one. Two index registers, IRO 
and IR1 , can also be used in indirect addressing, enabling the use of 32-bit 
indirect displacements. In some cases, an addressing scheme using circu- 
lar or bit-reversed addressing is optional. The mechanism for generating ad- 
dresses in circular addressing is discussed in Section 5.3, bit-reversed in 
Section 5.4. 

Table 5-2 lists the various kinds of indirect addressing, along with the value 
of the modification (mod) field, assembler syntax, operation, and function 
for each. The succeeding 1 8 examples show the operation for each kind of 
indirect addressing. 
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Table 5-2. Indirect Addressing 



Mod Field 


Syntax 


Operation 


Description 


Indirect Addressing with Displacement 


00000 


*+ARn(disp) 


addr = ARn + disp 


With predisplacement add 


00001 


*- ARn(disp) 


addr = ARn - disp 


With predisplacement subtract 


00010 


*++ARn(disp) 


addr = ARn + disp 
ARn = ARn + disp 


With predisplacement add and modify 


00011 


— ARn(disp) 


addr = ARn - disp 
ARn = ARn - disp 


With predisplacement subtract and modify 


00100 


*ARn++(disp) 


addr ■ ARn 
ARn = ARn + disp 


With postdisplacement add and modify 


00101 


*ARn — (disp) 


addr = ARn 
ARn = ARn - disp 


With postdisplacement subtract and modify 


00110 


*ARn++(disp)% 


addr = ARn 

ARn = circ(ARn + disp) 


With postdisplacement add and circular 
modify 


00111 


*ARn — (disp)% 


add = ARn 

ARn = circ(ARn - disp) 


With postdisplacement subtract and 
circular modify 


Indirect Addressing with Index Register IRO 


01000 


*+ARn(IR0) 


addr « ARn + IRO 


With preindex (IRO) add 


01001 


*-ARn(IR0) 


addr ■ ARn - IRO 


With preindex (IRO) subtract 


01010 


*++ARn(IR0) 


addr = ARn + IRO 
ARn = ARn + IRO 


With preindex (IRO) add and modify 


01011 


* — ARn(IRO) 


addr = ARn - IRO 
ARn = ARn - IRO 


With preindex^IRO) subtract and modify 


01100 


*ARn++(IR0) 


addr = ARn 
ARn = ARn + IRO 


With postindex (IRO) add and modify 


01101 


*ARn — (IRO) 


addr= ARn 
ARn = ARn -IRO 


With postindex (IRO) subtract and modify 


01110 


*ARn++(IR0)% 


addr = ARn 

ARn = circ(ARn + IRO) 


With postindex (IRO) add and circular 
modify 


01111 


*ARn — (IR0)% 


addr = ARn 

ARn = circ(ARn) - IRO 


With postindex (IRO) subtract and circular 
modify 



LEGEND: 






addr 




memory address 


ARn 




auxiliary register ARO - AR7 


IRn 




index register IRO or IR1 


disp 




displacement (5 bits or 8 bits on 'C40) 


++ 




add and modify 






subtract and modify 


circ( ) 




address in circular addressing 


% 




where circular addressing is performed 
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Table 5-2. Indirect Addressing (Concluded) 



Mod Field 


Syntax 


Operation 


Description 


Indirect Addressing with Index Register IR1 


10000 


*+ ARn(IR1) 


addr = ARn + IR1 


With preindex (IR1)add 


10001 


*-ARn(IR1) 


addr « ARn - IR1 


With preindex (IR1) subtract 


10010 


*++ ARn(IR1) 


aririr - ARn 4. IR1 
auui ~ nni i t in i 

ARn m ARn + IR1 


With nraindftx MR1\ arid 

Willi |JICmIIUC7A ^ 1 n 1 J ClVlVJ 

and modify 


10011 


* — ARn(IR1) 


addr = ARn - IR1 

OwUI *"• #»l 11 1 II 1 1 

ARn - ARn - IR1 


With ore index HR1\ subtract 

If III 1 Ul vll IwwA yil 11/ 0UWU MWI 

and modify 


10100 


* ARn ++ (IR1) 


addr = ARn 
ARn - ARn + IR1 


With postindex (IR1)add 
and modify 


10101 


*ARn — (IR1) 


addr ■ ARn 
ARn = ARn - IR1 


With postindex (IR1) subtract 
and modify 


10110 


* ARn ++ (IR1)% 


addr « ARn 

ARn = circ(ARn + IR1) 


With postindex (IR1)add 
and circular modify 


10111 


* ARn — (IR1)% 


addr = ARn 

ARn = circ(ARn - IR1 ) 


With postindex (IR1) subtract 
and circular modify 


Indirect Addressing (Special Cases) 


11000 


*ARn 


addr = ARn 


Indirect 


11001 


*ARn ++ (IRO)B 


addr = ARn 

ARn = B(ARn + IRQ) 


With postindex (IR0) add 
and bit-reversed modify 



LEGEND: 

addr 
ARn 
IRn 
disp 



circ( ) 

% 

B 



memory address 

auxiliary register AR0 - AR7 

index register IR0 or IR1 

displacement 

add and modify 

subtract and modify 

address in circular addressing 

where circular addressing is performed 

where bit-reversed addressing is performed 
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Example 5-2. Auxiliary Register Indirect 

An auxiliary register (ARn) contains the address of the operand to be 
fetched. 

Operation: operand address ■ ARn 

Assembler Syntax: *ARn 
Modification Field: 11000 



31 



ARn 



31 



oper&ttf 



Example 5-3. Indirect With Predisplacement Add 

The address of the operand to be fetched is the sum of an auxiliary register 
(ARn) and the displacement (disp). The displacement is either a 5-bit or 8-bit 
unsigned integer contained in the instruction word or an implied value of 1 . 

Operation: operand address = AR/7+ disp 

Assembler Syntax: *+ARn(disp) 

Modification Field: 00000 



31 



31 



ARn- 



fternaining 27 or 24 bite are zero filled 



{. / 



31 



(+) 



operand 
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Example 5-4. Indirect With Predisplacement Subtract 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn) minus the displacement (disp). The displacement is either an 
8-bit unsigned integer contained in the instruction word or an implied value 
of 1. 



disp 



Operation: 
Assembler Syntax: 
Modification Field: 

31 



ARn 



31 



31 



operand address = ARn- disp 

*-ARn(disp) 

00001 

o 



8 7 



H 



jo 



operand 



Example 5-5. Indirect With Predisplacement Add and Modify 

The address of the operand to be fetched is the sum of an auxiliary register 
(ARn) and the displacement (disp). The displacement is either an 8-bit un- 
signed integer contained in the instruction word or an implied value of 1 . Af- 
ter the data is fetched, the auxiliary register is updated with the address gen- 
erated. 



disp 



Operation: 

Assembler Syntax: 
Modification Field: 

31 



ARn 



31 



31 



operand address = ARn+ disp 
ARa?= ARn + disp 
*++ ARn(disp) 
00010 



8 7 



•B 



operand 
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Example 5-6. Indirect With Predisplacement Subtract and Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn) minus the displacement (disp). The displacement is either an 
8-bit unsigned integer contained in the instruction word or an implied value 
of 1 . After the data is fetched, the auxiliary register is updated with the ad- 
dress generated. 

Operation: operand address = ARn- disp 

AR/7= ARn- disp 

Assembler Syntax: *- - ARn(disp) 

Modification Field: 00011 



disp 



31 



ARn 



address 



31 



8 7 



31 



operand 



Example 5-7. 



Indirect With Postdisplacement Add and Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the displacement (disp) is added 
to the auxiliary register. The displacement is either an 8-bit unsigned integer 
contained in the instruction word or an implied value of 1 . 

Operation: operand address = ARn 

ARn= ARn+disp 

Assembler Syntax: *ARn ++ (disp) 

Modification Field: 001 00 



31 



ARn 



address 



31 



disp 



8 7 



Integer 



31 



operand 
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Example 5-8. Indirect With Postdisplacement Subtract and Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the displacement (disp) is sub- 
tracted from the auxiliary register. The displacement is either an 8-bit un- 
signed integer contained in the instruction word or an implied value of 1 . 

Operation: operand address = ARn 

ARn= ARn -disp 

Assembler Syntax: *AR/7 - - (disp) 

Modification Field: 00101 



31 



ARn 



31 



8 7 



disp 



integer 



31 



operand 



Example 5-9. Indirect With Postdisplacement Add and Circular Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the displacement (disp) is added 
to the contents of the auxiliary register using circular addressing. This result 
is used to update the auxiliary register. The displacement is either an 8-bit 
unsigned integer contained in the instruction word or an implied value of 1 . 

Operation: operand address = ARn 

ARn= circ(ARr?+ disp) 

Assembler Syntax: *ARn ++ (disp)% 

Modification Field: 00110 



31 



ARn 



31 



disp 



8 7 



integer 



(%) 



31 



operand 
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Example 5-10. Indirect With Postdisplacement Subtract and Circular Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the displacement (disp) is sub- 
tracted from the contents of the auxiliary register through circular address- 
ing. This result is used to update the auxiliary register. The displacement is 
either an 8-bit unsigned integer contained in the instruction word or an im- 
plied value of 1 . 

Operation: operand address = ARn 

ARn= circ(ARn- disp) 

Assembler Syntax: *AR/7- - (disp)% 

Modification Field: 00111 



31 



ARn 



31 



disp 



(K..0 



31 



address 



8 7 



integer 



(%) 



operand 



Example 5-11. Indirect With Preindex Add 

The address of the operand to be fetched is the sum of an auxiliary register 
(ARn) and an index register (IRO or IR1). 

Operation: operand address = AR/7+ IR m 

Assembler Syntax: *+ AR/7(IRm) 

Modification Field: 01000 ifm = 0 

10000 if m = 1 



31 



ARn 



31 



IRm- 



31 



address 



(+) 



operand 
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Example 5-12. Indirect With Preindex Subtract 

The address of the operand to be fetched is the difference between an auxil- 
iary register (ARn) and an index register (IRO or IR1). 

Operation: operand address = ARn- \Rm 

Assembler Syntax: *- AR n(IRm) 

Modification Field: 01 001 if m = 0 

10001 ifm=1 



31 



ARn 



31 



IRm- 



index \ 



31 



operand 



Example 5-/3. Indirect With Preindex Add and Modify 

The address of the operand to be fetched is the sum of an auxiliary register 
(ARn) and an index register (IRO or IR1 ). After the data is fetched, the auxil- 
iary register is updated with the address generated. 

Operation: operand address = ARn+ IR m 

ARn= AR/7+ IRm 



Assembler syntax: 
Modification Field: 

31 



*++ ARn(IRm) 

01010 ifm = 0 
10010 ifm=:1 



IRm- 





ARn — »» 


address 


31 








0 






i 


i 


index 


— 










31 






; 


; 




0 






operand 
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Example 5-14. Indirect With Preindex Subtract and Modify 

The address of the operand to be fetched is the difference between an auxil- 
iary register (ARn) and an index register (IRO or IR1 ). The resulting address 
becomes the new contents of the auxiliary register. 

Operation: operand address = ARn- IRm 

ARn= AR/7-IRm 



Assembler Syntax: 
Modification Field: 



* — ARn(IRm) 

01011 ifm = 0 
10011 ifm=1 



31 



ARn- 



31 



IRm 



Index 



31 



Example 5-15. Indirect With Postindex Add and Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the index register (IRO or IR1) is 
added to the auxiliary register. 

Operation: operand address = ARn 

ARn=ARn+ IRm 



Assembler Syntax: 
Modification Field: 



*ARA7++ (IRA77) 

01100 ifm = 0 
10100 if m = 1 



IRm 







31 






0 




ARn — 


address 


31 






0 


t 




index 










31 






' 0 






operand 
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Example 5-16. Indirect With Postilndex Subtract and Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the index register (IRO or IR1) is 
subtracted from the auxiliary register. 

Operation: operand address = ARn 

AR/7= AR/7 —IR/77 

Assembler Syntax: *ARn- - (\Rm) 

Modification Field: 01101 ifm = 0 

10101 if m =1 
31 0 

ARn — 



31 



IRnv 



index 



31 



operand 



Example 5-17. Indirect With Postindex Add and Circular Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the index register (IRO or IR1 ) is 
added to the auxiliary register. This value is evaluated using circular ad- 
dressing and replaces the contents of the auxiliary register. 

Operation: operand address = ARn 

ARn= circ(AR/7+ IRm) 

Assembler Syntax: *ARn++ (IR m)% 

Modification Field: 01110 if m = 0 

10110 ifm=1 



IRm 







31 






0 




ARn — ► 


address 


31 






0 


♦ 

(%) 

— 




Index 








31 






r 0 






operand 
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Example 5-18. Indirect With Postindex Subtract and Circular Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the index register (IRO or IR1) is 
subtracted from the auxiliary register. This result is evaluated through circu- 
lar addressing and replaces the contents of the auxiliary register. 

Operation: operand address = ARn 

AR/7= circ(ARn-IRm) 

Assembler Syntax: *ARn--(IR m)% 

Modification Field: 01111 if m . 0 

10111 if m =1 



31 



ARn 



31 



IRm 



index 



(%) 



31 



Example 5-19. Indirect With Postindex Add and Bit-Reversed Modify 

The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the index register (IRO) is added 
to the auxiliary register. This addition is performed with a reverse-carry prop- 
agation and can be used to yield a bit-reversed (B) address. This value re- 
places the contents of the auxiliary register. 

Operation: operand address = ARn 

AR/7=B(ARn+IR0) 
Assembler Syntax: *ARn++(IR0)B 
Modification Field: 11001 



31 



ARn 



31 



IRO 



index 



(B) 
•(+>« 



31 



operand 
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5.1 .4 Immediate Addressing 

In immediate addressing, the operand is a 16-bit immediate value 
contained in the 1 6 least significant bits of the instruction word (expr). De- 
pending upon the datatypes assumed for the instruction, the immediate op- 
erand may be a twos-complement integer, an unsigned integer, or a floating- 
point number. This is the syntax for this mode: 

Syntax: expr 

Example 5-20 gives an instruction example with before- and after-instruc- 
tion data. 

Example 5-20. Immediate Addressing 
Instruction 

SUBI 1,R0 

LDI 0FFFFh,R0 

LDF 5.0,R0 

OR OFFFFh, R0 



Before Instruction: After Instruction: 

R0 = Oh R0 - 00 FFFF FFFFh 

R0 = Oh R0 = 00 FFFF FFFFh 

R0 = Oh R0 = 02 2000 OOOOh 

R0 = Oh RO = 00 0000 FFFFh 



5.1.5 PC-Relative Addressing 

PC-relative addressing is used for branching. Instructions of this type in- 
clude Bcond, BcondD, BcondAF, BcondAT, DBcondand DBcondD (repeat 
block), and LAJ (link and jump). It replaces the value of the PC with the con- 
tents of the 1 6 or 24 least significant bits of the instruction word. The assem- 
bler takes the src (a label or address) specified by the user and generates 
a displacement. If the branch is a standard branch, this displacement is 
equal to [label - (PC +1)]. If the branch is adelayed branch, this displace- 
ment is equal to [label - (PC + 3)]. 

The displacement is stored as a 1 6-bit signed integer in the least significant 
bits of the instruction word. 

Syntax: expr 

Example 5-21 gives an instruction example with before- and after-instruc- 
tion data. 

Example 5-21 PC-Relative Addressing 

BU NEWPC ; pc=l,NEWPC= 5, displacement 5 * 3 

Before Instruction: After Instruction: 

PC = 1h PC= 5h 
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The 24-bit addressing mode is used to encode the program control instruc- 
tions (e.g., BR, BRD, CALL, RPTB, RPTBD, LA J). Depending on the in- 
struction, the new PC value is derived by adding a 24-bit signed value in the 
instruction word with the present PC value. Bit 24 determines the type of 
branch (D=0 for a standard branch or D=1 for a delayed branch). Some of 
these instructions are encoded in Figure 5-2. 

Figure 5-2. Encoding for 24-Bit PC-Relative Addressing Mode 





(a) 


BR, BRD: unconditional branches (delayed and not delayed) 




31 




25 


23 


0 


0 1 


1 0 


0 0 0 


D 


src 




(b) 


CALL: unconditional subroutine call 




31 








23 


0 


|o, 


1 0 


0 0 1 


0 


src 




(c) 


RPTB, RPRBD: repeat block (not delayed and delayed) 




31 








23 


0 


0 1 


1 1 


1 0 0 


D 


src 




(d) 


LAJ: link and jump (return address in extended-precision register R11) 




31 








23 


0 


0 1 


1 0 


0 0 1 


1 


src 
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5.2 Groups of Addressing Modes 

Six types of addressing (covered in Section 5.1, beginning on page 5-2) 
form these four groups of addressing modes: 

Subsection Page 

□ General addressing modes (G) 5.2.1 5-19 

□ Three-operand addressing modes (T) 5.2.2 5-20 

□ Parallel addressing modes (P) 5.2.3 5-23 

□ Conditional-branch addressing modes (B) 5.2.4 5-24 



5.2.1 General Addressing Modes 

Instructions that use the general addressing modes are general-purpose in- 
structions, such as ADDI, MPYF, and LSH. Such instructions usually have 
this form: 

dst operation src -> dst 

where the destination operand is signified by dst and the source operand 
by src; operation defines an operation to be performed with the general ad- 
dressing modes to specify certain operands. Bits 31 - 29 are zero, indicating 
general addressing mode instructions. Bits 22 and 21 specify the general 
addressing mode (G) field, which defines how bits 1 5 through 0 are to be 
interpreted for addressing the src operand. 

Options for bits 22 and 21 (G field) are as follows: 

0 0 register (all CPU registers unless specified otherwise) 

0 1 direct 

1 0 indirect 

1 1 immediate 

If the src and dst fields contain register specifications, the value in these 
fields contains the CPU register addresses as defined by Table 5-1 . For the 
general addressing modes, the following values of ARn are valid for indirect 
addressing: 

ARn, 0 < n < 7 

Figure 5-3 shows the encoding for the general addressing modes. The no- 
tation modn indicates the modification field that goes with the ARn field. Re- 
fer toTable 5-2 for further information. 
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Figure 5-3. Encoding for General Addressing Modes 

31 29 28 23 22 21 20 1615 11 10 8 7 54 0 



0 0 0 


operation 


0 0 


dst 


00000000000 


src 


0 0 0 


operation 


0 1 


dst 


direct 


0 0 0 


operation 




dst 


modn 


ARn 


disp 


0 0 0 


operation 


0 1 


dst 


immediate 




Q 


Destination 


Source Operands 



5.2.2 Three-Operand Addressing Modes 

The 1 9 three-operand instructions on the 'C40 use the eight address forms 
listed in Table 5-3: 

Table 5-3. Three-Operand Instruction Address Forms 
Type It 



T 


srtf addressing modes 


srdZ addressing modes 


dst* 


00 


register mode (any CPU register) 


register mode (any CPU register) 


Rx 


01 


indirect mode (disp = 0, 1, IRO, IR1) 


register mode (any CPU register) 


Rx 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


Rx 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 


Rx 



Type 2t 



T 


srci addressing modes 


src2 addressing modes 


dst* 


00 


register mode (any CPU register) 


8-bit signed immediate 


Rx 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


Rx 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


Rx 


11 


indirect mode *+ARn1 (5-bit un- 
signed displacement) 


indirect mode *+ARn2(5-bit un- 
signed displacement) 


Rx 



t The 'C40 recognizes either type 1 or type 2 instructions; the 'C30 recognizes only type 1 . 
♦ Rx = any register in the CPU (primary) register file for the respective processor. 



The object values differ for three-operand instructions, depending on the 
assembler used: 

□ the TMS320C3x assembler recognizes only type 1 formats and sets bits 
31-28 to 0010 2 . 

□ the TMS320C4x assembler recognizes both types and sets bits 31-28 
to 001 0 2 for type 1 and to 0011 2 for type 2. 
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The 'C4x processor executes both types (1 and 2). The 'C30 executes only 
the type 1 format. The three-operand instructions MPYSHI3 and MPYUHI3 
are unique to the 'C40. 

All instructions except four can use all four of the type 2 address forms 
shown in Table 5-3. These exceptions, which can use only address forms 
2 and 4 in type 2, are the floating-point instructions ADDF3, CMPF3, 
MPYF3, and SUBF3. 

The remaining 15 three-operand instructions are ADDC3, ADDI3, AND3, 
ANDN3, ASH3, CMPI3, LSH3, MPYI3, MPYSHI3, MPYUHI3, OR3, 
SUBB3, SUBI3, TSTB3, and XOR3. 

Note that the 3 can be omitted from a three-operand instruction mnemonic. 

Bits 22 and 21 specify the three-operand addressing mode (T) field, which 
defines how bits 1 5 - 0 are to be interpreted for addressing the srcoperands. 
Bits 15-8 define the srd address, and bits 7- 0 define the src2 address. 

Figure 5-4 and Figure 5-5 show the encoding for 'C4x three-operand ad- 
dressing (the '030 recognizes only the format in Figure 5-4). The notation 
modm or modn indicate that the modification field goes with the ARm or 
ARn field, respectively. Refer to Table 5-2 (page 5-6) for further informa- 
tion. 

The 8-bit signed immediate value supports left shifts, right shifts, and 
memory increment and decrement operations. The immediate value is not 
available for floating-point operations. 

These instructions greatly help reduce code size, both assembled and com- 
piled. They also give noticeable performance improvements in DSP and 
other computationally intensive applications and general-purpose code. 
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Figure 5-4. Encoding for Type 1 Three-Operand Addressing Modes ('C30 and 'C40) 

31 28 27 23 22 21 20 16 15 13 12 11 10 8 7 5 4 3 2 0 



0 0 10 


operation 


0 0 


dst 


0 0 0 


srd 


0 0 0 


src2 


0 0 10 


operation 


0 1 


dst 


modn 


ARn 


0 0 0 


src2 


0 0 10 


operation 


1 0 


dst 


0 0 0 


srd 


modn 


ARn 


0 0 10 


operation 


1 1 


dst 


modn 


ARn 


modm 


ARm 



| T | | srd | src2 



Figure 5-5. Encoding for Type 2 Three-Operand Addressing Modes ('C40 Only) 

31 28 27 23 22 21 20 16 15 13 12 10 8 7 3 2 0 



0 0 11 


operation 


0 0 


dst 


0 0 0 


Rn 


immediate 


0 0 11 


operation 


0 1 


dst 


0 0 0 


Rn 


disp 


ARn 


0 0 11 


operation 


1 0 


dst 


disp 


ARn 


immediate 


0 0 11 


operation 


1 1 


dst 


disp 


ARn 


disp 


ARm 



srd j src2 



5-22 



Addressing 



Groups of Addressing Modes — Parallel 



5.2.3 Parallel Addressing Modes 

Instructions that use parallel addressing, indicated by || (two vertical bars), 
allow for the greatest amount of parallelism possible. The destination oper- 
ands are indicated as d1 and d2, signifying dst 1 and cfef2, respectively (see 
Figure 6-4). The source operands, signified by srd and src2, use the ex- 
tended-precision registers. The parallel operation to be performed is called 
operation. 

Figure 5-6. Encoding for Parallel Addressing Modes 



31 3029 262524 23 22 21 1918 1615 11 10 8 7 32 0 



1 0 


operation P 


d1 d2 


srd 


sr<2 


modn 


ARn 


modm 


ARm 



src3 src4 



The parallel addressing mode (P) field specifies how the operands are to 
be used, i.e., whether they are source or destination. The specific relation- 
ship between the P field and the operands is detailed in the description of 
the individual parallel instructions (see Chapter 11). However, the operands 
are always encoded in the same way. Bits 31 and 30 are set to the value of 
1 0, indicating parallel addressing mode instructions. Bits 25 and 24 specify 
the parallel addressing mode (P) field, which defines how bits 21-0 are to 
be interpreted for addressing the src operands. Bits21 - 1 9are used to de- 
fine the srd address, bits 1 8 - 1 6 to define the src2 address, bits 1 5 - 8the 
src3 address, and bits 7 - 0 the src 4 address. The notations modn and 
modm indicate which modification field goes with which ARn or ARm (au- 
xiliary register) field, respectively. The parallel addressing operands are 
listed below. 



srd = Rn 


(0 £ n < 7 for extended-precision registers R0 - 


R7) 


src2 = Rn 


(0 £ n < 7 for extended-precision registers R0 - 


-R7) 


d1 


If 0,dst1 is R0. If 1,dst1 is R1. 




d2 


If 0, dst2\s R2. If l,dst2\s R3. 




P 


0< P<3 




src3 


indirect (disp = 0, 1 , IR0, IR1) 




srcA 


indirect (disp = 0, 1, IRQ, IR1) 





As in the three-operand addressing mode, indirect addressing in the parallel 
addressing mode allows for displacements of 0 or 1 and the use of the index 
registers (IR0 and IR1 ). The displacement of 1 is implied and is not explicitly 
coded in the instruction word. 

In the encoding shown for this mode in Figure 5-6, if the src3 and src4 fields 
use the same auxiliary register, both addresses are correctly generated, but 
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only the value created by the src3\ ield is saved in the auxiliary register spe- 
cified. The assembler issues a warning if you specif iy this condition is speci- 
fied by the user. 

5.2.4 Conditional-Branch Addressing Modes 

Instructions using the conditional-branch addressing modes (Bcond, 
BcondD t CALLcond, DBcond, and DBcondD) can perform a variety of con- 
ditional operations. Bits 31 - 27 are set to the value of 01101, indicating con- 
ditional-branch addressing mode instructions. Bit 26 is set to 0 or 1 ; the for- 
mer selects DBcond, the latter Bcond. Selection of bit 25 determines the 
conditional-branch addressing mode (B). If B = 0, register addressing is 
used; if B = 1 , PC-relative addressing is used. Selection of bit 21 sets the 
type of branch: D = 0 for a standard branch or D = 1 for a delayed branch. 
The condition f ield(cond) specifies the condition checked to determine what 
action to take, i.e., whether or not to branch (see Table 1 1 -8 on page 11-12 
for a list of condition codes). Figure 6-6 shows the encoding for conditional- 
branch addressing. 

Figure 5-7. Encoding for Conditional-Branch Addressing Modes 
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Bcond (D): 
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5.3 Circular Addressing 

Many algorithms, such as convolution and correlation, require the imple- 
mentation of a circular buffer in memory. In convolution and correlation, the 
circular buffer is used to implement a sliding window that contains the most 
recent data to be processed. As new data is brought in, the new data over- 
writes the oldest data. The key to the implementation of a circular buffer is 
the implementation of a circular addressing mode. This section describes 
the circular addressing mode of the TMS320C40. 

The block-size register (BK) specifies the size of the circular buffer. The bot- 
tom of the circular buffer is specified by the first 1 (one) bit (counting from 
the most significant bit to the least significant bit) in the lower 1 6 bits of the 
BK register, plus a user-selected auxiliary register (ARn). With the location 
of the first 1 bit specified as bit N, the address at the top of the buffer is re- 
ferred to as the effective base (EB) and is equal to bits 31 through (N+1) of 
ARn with bits N through 0 of EB being zero. 

Figure 5-8 illustrates the relationships among the block-size register (BK), 
the auxiliary registers (ARn), the bottom of the circular buffer, the top of the 
circular buffer, and the index into the circular buffer. 
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Figure 5-8. Flowchart for Circular Addressing 
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In circular addressing, index refers to the N LSBs of the auxiliary register 
selected, and step is the quantity being added to or subtracted from the 
auxiliary register. Follow these two rules when you use circular addressing: 

□ The step used must be less than or equal to the block-size. 

□ The first time the circular queue is addressed, the auxiliary register must 
be pointing to an element in the circular queue. 

The algorithm for circular addressing is as follows: 

If 0 £ index + step < BK: 
index = index + step. 

Else if index + step > BK: 
index = index + step - BK. 

Else if index + step < 0: 
index = index + step + BK. 



Figure 5-9 shows how the circular buffer is implemented. It illustrates the 
relationship of the quantities generated and the elements in the circular 
buffer. 



Figure 5-9. Circular Buffer Implementation 
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Figure 5-1 0 gives an example of the operation of circular addressing. As- 
suming that all ARs are four bits, let AR0 = OOOO2, and BK = 011 02 (block- 
size of 6). This example shows a sequence of modifications and the result- 
ing value of ARO. It also shows how the pointer steps through the circular 
queue with a variety of step sizes (both incrementally and decre mentally). 



Figure 5-10. Circular Addressing Example 



*AR0 ++ (5)% 


; ARO = 


0 


(Oth value) 


*AR0 ++ (2)% 


; ARO - 


5 


(1st value) 


*AR0- -(3)% 


; ARO = 


1 


(2nd value) 


*AR0++(6)% 
*AR0- -% 


; ARO = 


4 


(3rd value) 


; ARO = 


4 


(4th value) 


*AR0 


; ARO = 


3 


(5th value) 



Value 

Oth 

2nd -> 
5th -> 



Data 



4th f 3rd 



1st -» 



Element 0 



Element 1 



Element 2 



Element 3 



Element 4 



Element 5 (Last Element) 
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Circular addressing is especially useful for the implementation of FIR filters. 
Figure 5-1 1 shows one possible data structure for FIR filters. Note that the 
initial value of ARO points to h(N -1), and the initial value of AR1 points to 
x(0). Circular addressing is used in the TMS320C40 code for the FIR filter 
shown in Figure 5-12. 



Figure 5-11 



ARO 



Data Structure for FIR Filters 
Impulse Response 

h(N~2) 



Input Samples 
x(N-1) 
x{N~2) 



J52L 

h(1> 
NO) 



x(2) 

x(D 
x{0) 



AR1 



Figure 5-12. FIR Filter Code Using Circular Addressing 



* Initialization 



LDI 
LDI 
LDI 



TOP LDF 
STF 

LDF 
LDF 



N,BK 
H, ARO 
X,AR1 



IN, R3 

R3, *ARl++% 

0,R0 
0, R2 



Filter 

RPTS 
MPYF3 
ADDF3 
ADDF 



N - 1 



Load block size. 

Load pointer to impulse response. 
Load pointer to bottom of input 
sample buffer. 

Read input sample. 
Store with other samples, 
and point to top of buffer. 
Initialize R0. 
Initialize R2 . 



; Repeat next instruction. 
*AR0++% , *ARl++% , R0 

R0 / R2 / R2 ; Multiply and accumulate. 

R0,R2 ; Last product accumulated. 



STF 
B 



R2, Y 
TOP 



Save result , 
Repeat . 
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5.4 Bit-Reversed Addressing 

Bit-reversed addressing on the TMS320C40 enhances execution speed 
and program memory for FFT algorithms that use a variety of radices. One 
auxiliary register points to the physical location of a data value. I RO specifies 
one-half the size of the FFT; e.g., the value contained in IRO must be equal 
to 2 n ~ 1 where n is an integer and the FFT size is 2 n . When you add IRO to 
the auxiliary register by using bit-reversed addressing, addresses are gen- 
erated in a bit-reversed fashion. The largest index for bit reversed is OOFF 
FFFFh. 

To illustrate this kind of addressing, assume 8-bit auxiliary registers. Let 
AR2 contain the value 0110 OOOO2 (96io)- This is the base address of the 
data in memory. Let IRO contain the value 0000 IOOO2 (8). Figure 5-13 
shows a sequence of modifications of AR2 and the resulting values of AR2. 



Figure 5-13. Bit-Reversed Addressing Example 



*AR2++(IR0)B 
*AR2++(IR0)B 
*AR2++(IR0)B 
*AR2++(IR0)B 
*AR2++(IR0)B 
*AR2++(IR0)B 
*AR2++(IR0)B 
*AR2 



AR2 
AR2 
AR2 
AR2 
AR2 
AR2 
AR2 
AR2 



0110 
0110 
0110 
0110 
0110 
0110 
0110 
0110 



0000 
1000 
0100 
1100 
0010 
1010 
0110 
1110 



(0th value) 
(1st value) 
(2nd value) 
(3rd value) 
(4th value) 
value) 
value) 
value) 



(5th 
(6th 
(7th 



Table 5-4 shows the relationship of the index steps and the four LSBs of 
AR2. As you can see, you can find the four LSBs by reversing the bit pattern 
of the steps. 

Table 5-4. Index Steps and Bit-Reversed Addressing 



Step 


Bit Pattern 


Bit-Reversed 
Pattern 


Bit-Reversed 
Step 


0 


0000 


0000 


0 


1 


0001 


1000 


8 


2 


0010 


0100 


4 


3 


0011 


1100 


12 


4 


0100 


0010 


2 


5 


0101 


1010 


10 


6 


0110 


0110 


6 


7 


0111 


1110 


14 


8 


1000 


0001 


1 


9 


1001 


1001 


9 


10 


1010 


0101 


5 


11 


1011 


1101 


13 


12 


1100 


0011 


3 


13 


1101 


1011 


11 


14 


1110 


0111 


7 


15 


1111 


1111 


15 
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5.5 System and User Stack Management 

The TMS320C40 provides a dedicated system stack pointer (SP) for build- 
ing stacks in memory. The auxiliary registers can also be used to build a vari- 
ety of more general linear lists. This section discusses the implementation 
of the following types of linear lists: 

Stack A linear list for which all insertions and deletions are made at one 
end of the list. 

Queue A linear list for which all insertions are made at one end of the 
list, and all deletions are made at the other end. 

Dequeue A double-ended queue linear list for which insertions and dele- 
tions are made at either end of the list. 

The system stack pointer (SP) is a 32-bit register that contains the address 
of the top of the system stack. The system stack fills from low-memory ad- 
dress to high-memory address (see Figure 5-1 4). The SP always points to 
the last element pushed onto the stack. A push performs a preincrement, 
and a pop performs a postdecrement of the system stack pointer. 

The program counter is pushed onto the system stack on subroutine calls, 
traps, and interrupts. It is popped from the system stack on returns. The sys- 
tem stack can be pushed and popped with the PUSH, POP, PUSHF, and 
POPF instructions. 



Figure 5-14. System Stack Configuration 
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5.5.1 Stacks 

Stacks can be built from low to high memory or high to low memory. Two 
cases for each type of stack are shown. You can build stacks by using the 
preincrement/decrement and postincrement/decrement modes of modify- 
ing the auxiliary registers (AR). You can implement stack growth from high 
to low memory in two ways: 

Case 1 : Store to memory using *- - ARn to push data onto the stack and 
reads from memory using *ARn ++ to pop data off the stack. 

Case 2: Store to memory using *ARn — to push data onto the stack 
and read from memory using * ++ ARn to pop data off the stack. 

Figure 5-1 5 illustrates these two cases. The only difference is that in case 
1 , the AR always points to the top of the stack, and in case 2, the AR always 
points to the next free location on the stack. 



Figure 5-15. Implementations of High-to-Low Memory Stacks 
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You can implement stack growth from low to high memory in two ways: 

Case 3: Store to memory using *++ ARn to push data onto the stack and 
reads from memory using *ARn - - to pop data off the stack. 

Case 4: Stores to memory using *ARn ++ to push data onto the stack and 
reads from memory using *- - ARn to pop data off the stack. 

Figure 5-1 6 shows these two cases. In the case 3, the AR always points to 
the top of the stack. In case 4, the AR always points to the next free location 
on the stack. 



Figure 5-16. Implementations of Low-to-High Memory Stacks 
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5.5.2 Queues and Dequeues 

The implementations of queues and dequeues is based upon the manipu- 
lation of the auxiliary registers for user stacks. For queues, two auxiliary 
registers are used: one to mark the front of the queue from which data is 
popped and the other to mark the rear of the queue where data is pushed. 

For dequeues, two auxiliary registers are also necessary. One is used to 
mark one end of the dequeue, and the other is used to mark the other end. 
Data can be popped or pushed from either end. 
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The TMS320C40 provides a complete set of constructs that facilitate soft- 
ware and hardware control of the program flow. Software control includes 
repeats, branches, calls, traps, and returns. Hardware control includes 
reset and interrupts. Because programming includes a variety of constructs, 
you can select the one suited for your particular application. 

Several interlocked operations instructions provide flexible multiprocessor 
support and, through the use of external signals, a powerful means of 
synchronization. They also guarantee the integrity of the communication 
and result in a high-speed operation. 

The TMS320C40 supports a nonmaskable external reset signal and a 
number of internal and external interrupts. These functions can be pro- 



grammed for a particular application. 

This chapter discusses the following major topics: 

Section Page 

6.1 Repeat Modes 6-2 

■ Initialization 6-2/6-4 

■ Operation 6-4 

6.2 Delayed Branches 6-7 

6.3 Calls, Traps, Branches, Jumps, and Returns 6-9 

6.4 Unifying Traps and Interrupts 6-11 

6.5 Interlocked Operations 6-13 

6.6 Reset Operation 6-18 

6.7 Interrupts 6-23 

■ Interrupt Control Bits 6-24 

■ Prioritization and Control 6-24 
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6.1 Repeat Modes 

The repeat modes of the TMS320C40 can implement zero-overhead loop- 
ing. For many algorithms, most execution time is spent in an inner kernel 
of code. Using the repeat modes allows these time-critical sections of code 
to be executed in the shortest possible time. 

The TMS320C40 provides three instructions to support zero-overhead 
looping: RPTB, RPTBD (repeat a block of code/delayed) and RPTS (repeat 
a single instruction): 

□ RPTB and RBTBD causes a block of code to be repeated a specified 
number of times, and 

□ RPTS causes a single instruction to be repeated a number of times and 
reduces the bus traffic by fetching the instruction only once. 

Three registers (RS, RE, and RC) are associated with the updating of the 
program counter when it is updated in a repeat mode, as described in 
Table 6-1 below. 

Table 6-1. Repeat-Mode Registers 



Register 


Function 


RS 


Repeat start address register. Holds the address of the first instruction of 
the block of code to be repeated. 


RE 


Repeat end address register. Holds the address of the last instruction 
of the block of code to be repeated. 


RC 


Repeat-count register. Contains one less than the number of rimes 
the block remains to be repeated. 



6.1.1 Repeat-Mode Initialization 

Two bits are important to the operation of RPTB, RPTBD and RPTS: the 
RM and S bits. 

□ The RM (repeat-mode flag) bit in the status register specifies whether 
the processor is running in the repeat mode. 

■ If RM = 0, fetches are not made in repeat mode. 

■ If RM = 1 , fetches are made in repeat mode. 

□ The S bit is internal to the processor and cannot be programmed, but 
this bit is necessary to fully describe the operation of RPTB and RPTS. 

■ If RM = 1 and S = 0, RPTB or RPTBD is executing. Program fetches 
are from memory. 

■ If RM = 1 and S = 1 , RPTS is executing. After the first fetch (from 
memory), program fetches are from the instruction register (IR). 
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The correct operation of the repeat modes requires that all of the above reg- 
isters and status register fields be initialized correctly. The RPTB, RPTBD, 
and RPTS instructions perform this initialization in slightly different ways 
(see subsections 6.1 .2 and 6.1 .3). 

6.1 .2 RPTB and RPTBD Initialization 

The execution sequence of RPTB src or RPTBD src is nearly the same: 

1 ) Loads the start address of the block into RS (repeat start address regis- 
ter). 

a) For RPTB, this is the next address following the instruction: 

PC of RPTB + 1 -> RS 
or 

b) For RPTBD, this is the fourth address following the instruction: 

PC of RPTBD + 4 -> RS 

2) Loads the end address of the block into RE (repeat end address regis- 
ter). 

a) In PC-relative mode, the 24-bit src operand plus RS is the end ad- 
dress: 

For RPTB, 

src + PC of RPTB + 1 -» RE 
or 

For RPTBD, 

src + PC of RPTBD + 3 -» RE 

b) In register mode, the contents of the src register is the end address: 

contents of src register -» RE 

3) Sets the status register to indicate the repeat mode of operation. 

1 RM status register bit (repeat mode flag) 

4) Indicates that this is the repeat block mode of operation. 

0 -> S bit (bit is internal to processor; not programmable) 

The last bit of information required is the number of times to repeat the block. 
The value is determined by properly initializing the RC (repeat count) regis- 
ter. Because the execution of RPTB and RPTBD does not load the RC, you 
must load this register yourself. Atypical setup of the block repeat operation 
is shown below. 

LDI 15, RC ; 15 -»RC 

RPTB LOOP ; LOOP -»RE, PC + 1 —» RS, 1 ->RM, 0 ~>S 

The repeat modes repeat a block of code at least once in a typical operation. 
The repeat counter should be loaded with one less than the number of times 
to execute the block; i.e., an RC value of 0 executes the block of code one 
time, or an RC value of 4 would execute the block five times. All block re- 
peats initiated by RPTB or RPTBD can be interrupted. 
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6.1.3 RPTS Initialization 

When RPTS src is executed, the following sequence of operations occurs: 

1) PC + 1->RS 

2) PC + 1 -> RE 

3) 1 -» RM status register bit 

4) 1 bit 

5) src -> RC (repeat count register) 

The RPTS instuction loads all registers and mode bits necessary for the op- 
eration of the single instruction repeat mode. Step 1 loads the start address 
of the block into RS. Step 2 loads the end address into the RE (end address 
of the block). Since this is a repeat of a single instruction, the start address 
and the end address are the same. Step 3 sets the status register to indicate 
the repeat mode of operation. Step 4 indicates that this is the repeat single- 
instruction mode of operation. Step 5 loads src into RC. 

Repeats of a single instruction initiated by RPTS are not interruptible, be- 
cause the RPTS fetches the instruction word only once and then keeps it 
in the instruction register for reuse. An interrupt would cause the instruction 
word to be lost. The refetching of the instruction word from the instruction 
register reduces memory accesses and, in effect, acts as a one-word pro- 
gram cache. If it is necessary to have a single instruction that is repeatable 
and interruptible, you can use the RPTB instruction. 

6.1.4 Repeat-Mode Operation 

Information in the repeat-mode registers and associated control bits is used 
to control the modification of the PC when the fetches are being made in re- 
peat mode. The repeat modes compare the contents of the RE register (re- 
peat end address register) with the program counter (PC). If they match and 
the repeat counter is nonnegative, the repeat counter is decremented, the 
PC is loaded with the repeat start address, and the processing continues. 
The fetches and appropriate status bits are modified as necessary. Note that 
the repeat counter (RC) is never modified when the repeat-mode flag (RM) 
is 0. The maximum number of repeats occurs when RC = 0 8000 0001 h. 
This will result in 0 8000 0001 h repetitions. The detailed algorithm for the 
update of the PC is shown in Figure 6-1 . 

The RPTB and RPTS are four-cycle instructions. These four cycles of over- 
head are incurred only on the first pass through the loop. All subsequent 
passes through the loop are accomplished with zero cycles of overhead. In 
Example 6-1 , the block of code from STLOOP to ENDLOP is repeated six- 
teen times. 
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Example 6-1. RPTB Operation 



LD 

HP TB 



STLOOP 



15, RC 
ENDLOP 



Load repeat counter with 15 
Execute the block of code 

from STLOOP to ENDLOP 16 times 



ENDLOP 



Figure 6-1. Repeat-Mode Control Algorithm 

if RM 1 
if S - 1 

if first time through 

fetch instruction from memory 
else 

fetch instruction from IR 
RC ~ 1 — > RC 
if RC < 0 

0 ~> ST(RM) 

PC + 1 — > PC 
else if S == 0 
fetch instruction from memory 
if PC «« re 

RC - 1 RC 

if RC > 0 

RS -> PC 
else if RC < 0 

0 -» ST(RM) 

0 -» S 

PC + 1 PC 



If in repeat mode (RPTB or RPTS) 
If RPTS 

If this is the first fetch 

Fetch instruction from memory 

If not the first fetch 

Fetch instruction from IR 

Decrement RC 

If RC is negative 

Repeat single mode completed 

Turn off repeat mode bit 

Clear S 

Increment PC 

If RPTB 

Fetch instruction from memory 

If this is the end of the block 

Decrement RC 

If RC is not negative 

Set PC to start of block 

If RC is negative 

Turn off repeat mode bits 

Clear S 

Increment PC 



Using the repeat block mode of modifying the PC facilitates analysis of what 
would happen in the ease of branches within the block. Assume that the next 
value of the PC will be either PC + 1 or the contents of the RS register. It is 
thus apparent that this method of block repeat allows branching within the 
repeated block. Execution can go anywhere within the user's code via inter- 
rupts, subroutine calls, etc. For proper modification of the loop counter, the 
last instruction of the loop must be fetched. By writing a 0 into the repeat 
counter or writing 0 into the RM bit of the status register, you can stop 
the repeating of the loop prior to completion. 
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Since the block repeat modes modify the program counter, other instruc- 
tions cannot modify the program counter at the same time. Two rules apply: 
Rule 1 : The last instruction in the block (or the only instruction 
in a block of size one) cannot be a Bcond, DBcond, CALL, 
CALLconof, TRAPcond, RETlcond, RETScond, IDLE, RPTB, 
or RPTS. Example 6-2 shows an incorrectly placed standard 
branch. 

Rule 2: None of the last four instructions from the bottom of the 
block (or the only instruction in a block of size one) can be a 
Bcond Q BRD, or DBcondD, RPTBD, LA J, LAJcond, LATcond, 
BcondAF, BcondAT, or RETI cond. Example 6-3 shows an incor- 
rectly placed delayed branch. 

If either of these rules is violated, the PC will be undefined. 

Example 6-2. Incorrectly Placed Standard Branch 

LD 15, RC ; Load repeat counter with 15 

RPTB ENDLOP ; Execute block of code 

STLOOP ; from STLOOP to ENDLOP 16 times 



ENDLOP BR 



OOPS 



This branch violates rule 1 



Example 6-3. Incorrectly Placed Delayed Branch 



STLOOP 



LD 

RPTB 



15, RC 

ENDLOP 



Load repeat counter with 15 

Execute block of code 

from STLOOP to ENDLOP 16 times 



ENDLOP 



BRD 
ADDF 
MPYF 
SUBF 



OOPS 



; This branch violates rule 2 



Block repeats (RPTB and RPTBD) are nestable. Since all of the control is 
defined by the RS, RE, RC, and ST registers, these registers must be saved 
and stored in order to nest block repeats. The status register RM bit can be 
used to determine whether the block repeat mode is active. For example, 
if you write an interrupt service routine that requires the use of RPTB or 
RPTBD, it is possible that the interrupt associated with the routine may oc- 
cur during another block repeat. The interrupt service routine can check the 
RM bit. If this bit is set, the interrupt routine saves RS, RE, RC, and ST The 
interrupt routine can then perform a block repeat. Before returning to the in- 
terrupted routine, the interrupt routine restores RS, RE, RC, and ST If the 
RM bit is not set, you don't need to save and restore these registers. 
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6.2 Delayed Branches 



The TMS320C40 offers two main types of branching: standard and delayed. 
Standard branches empty the pipeline before performing the branch; this 
guarantees correct management of the program counter and results in a 
TMS320C40 branch taking four cycles. Included in this class are repeats, 
calls, returns, and traps. 

Delayed branches without annulling do not empty the pipeline but, rather, 
guarantee that the next three instructions will execute before the program 
counter is modified by the branch. Delayed branches with annulling may 
conditionally annul the next three instructions. The result is a branch that re- 
quires only a single cycle, thus making the speed of the delayed branch very 
close to the optimal block repeat modes of the TMS320C40. However, un- 
like block repeat modes, delayed branches may be used in situations other 
than looping. Every delayed branch has a standard branch counterpart that 
is used when a delayed branch cannot be used. The delayed branches with- 
out annulling are BcondD, BRD, and DBcondD. Those with annulling are 
BcondATand BcondAF. 

Conditional delayed branches use the conditions, reflected in the status reg- 
ister, that existed at the end of the instruction preceding the branch. They 
do not depend upon the instructions following the delayed branch. Delayed 
branches without annulling guarantee that the next three instructions will ex- 
ecute, regardless of other pipeline conflicts. 

When a delayed branch is fetched, it remains pending until the three 



instructions that follow are executed, 
mediately after a delayed branch 
Example 6-4): 



None of the three instrutions im- 
can be any of the following (see 



Bcond 

BcondD 

BcondAF^ 

BcondATt 

BR 



BRD 

DBcond 

DBcondD 

CALL 

CALLcond 



IDLE 
LAJ 

LA J co nd 
LATcond 
RETlcond 



RETIcondD 

RETScond 

RPTB 

RPTBD 

RPTS 

TRAPcond 



t BcondAF and BcondAT are described in Section 6.3 on page 6-9. 
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Incorrectly used delayed branches can leave the PC undefined. 

Delayed branches disable interrupts until the three instructions following the 
delayed branch are completed. This is independent of whether or not the 
branch is taken. 

Example 6-4. Incorrectly Placed Delayed Branches 

Bl : BD LI 

|||^ 

B2: B L2 ; This branch is incorrectly placed 

lllllll^ 

lilllllll 



□ The BcondfiJ and BcondAF instructs both branch if conditions are 
true, but 

■ BcondfiJ executes but annuls (cancels effect of — except for time 
delay) the execute phase of the next three instructions following 
BcondfiJ. Then it takes the branch. If condls false, execution con- 
tinues immediately after the Bcond AT. 

■ BcondAF first executes the next three instructions following the 
BcondAF.Then it takes the branch. If cond is false, execution con- 
tinues immediately after the BcondAF but the execution phase of 
the first three instructions are annulled. 
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6.3 Calls, Traps, Branches, Jumps and Returns 

Calls and traps provide a means of executing a subroutine or function while 
providing a return to the calling routine. 

The CALL, CALLcond, and TRAPcond instructions store the value of the 
PC on the stack before changing the PC's contents. The RETSconof or RE- 
Tl cond (standard or delayed) instructions return execution from traps and 
calls using the value on the stack. 

□ CALL places the next PC value on the stack and places the src(source) 
operand into the PC. The src is a 24-bit PC-relative or register value. 
Figure 6-2 shows CALL response timing. 

□ CALLcond is similar to the CALL instruction (above) except that (1 ) it 
executes only if a specific condition is true (the 20 conditions — includ- 
ing unconditional — are listed in Section 11 .2 on page 11-10 ) and (2) 
the src is either a PC-relative displacement or in register addressing 
mode. 

□ JRAPcond also executes only if a specific condition is true (same con- 
ditions as for the CALLcond instruction). When it executes, (1) inter- 
rupts are disabled with 0 written to bit GIE of the ST, (2) the next PC 
value is stored on the stack, and (3) a vector is retrieved from one of the 
addresses from 20h to 3Fh and loaded into the PC. The particular ad- 
dress corresponds to a trap number in the instruction. Using RETlcond 
or RETIcondD to return re-enables interrupts if the status register's GIE 
bit was set previously. 

□ RETSconcf returns execution from any of the above three instructions 
by popping the top of the stack to the PC. For RETScondto execute, 
the specified condition must be true. Conditions are the same as for the 
CALLcond instruction. 

□ RETIcondretums from traps or calls in the same way as the RETScond 
(above) does with the addition that RETlcond also copies the PGIE and 
PCF bit values into the GIE and CF bits of the status register. Conditions 
are the same as for the CALLcond instruction. 

□ RETIcondD returns from traps or calls the same way as the RETlcond 
(above) does with the addition that RETIcondD also first executes the 
next three instructions immediately following the RETIcondD. Condi- 
tions are the same as for the CALLcond instruction. 

□ Link and jump (LA J), linkand jump conditional (LAJcond), and linkand 
trap conditional (LAJcond) each provide a return address in extended- 
precision register R11. 

■ After it executes the three instructions that follow it, LAJ jumps to 
an address derived by the concatenation of the most significant 8 
bits of the PC and the 24-bit src address in the instruction. 
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■ LMcond destination address is either PC-relative (a displace- 
ment) or the contents of a specified register. If the condition is true, 
LAJcondi irst executes the three instructions following the LAJ cond 
before making the jump. If the condition is not true, execution con- 
tinues immediately after the LAJcond instruction. 

■ After it executes the three instructions that follow it, LfiJcond calls 
one of the 51 2 available trap vectors pointed to by the trap vector 
table pointer (TVTP) in Section 3.2 on page 3-1 5. The vector value 
is loaded into the PC. 

Functionally, calls and traps accomplish the same task: namely, a subf unc- 
tion is called and executed, and control is then returned to the calling func- 
tio n . Traps offer several advantages: 

1 ) Interrupts are automatically disabled when a trap is executed. This al- 
lows critical code to execute without risk of being interrupted. Thus, 
traps are usually terminated with a RETIconofor RETIconc/D instruction 
to re-enable interrupts if the status register GIE bit was set previously. 

2) You can use traps to indirectly call functions. This is particularly benefi- 
cial when a kernel of code contains the basic subfunctions to be used by 
applications. In this case, the functions in the kernel can be modified 
and relocated without recompiling each application. 



Figure 6-2. CALL Response Timing 
I Fetch CALL I 



H1 
ADDR 
Data 



Store PC , fiSlMSff, 
on Stack I of CALL I 




<Vector AddressX ^lMFV 
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6.4 Unifying Traps and Interrupts 



Traps and interrupts on the TMS320C4x are unified in all forms of operation 
except initialization. 

6.4.1 Initialization 

At initialization: 

□ Traps are always triggered by a software mechanism, either with 
TRAPconof (conditional trap) or LATcond (link and trap conditionally 
delayed). 

□ Interrupts are triggered by hardware events (i.e., external interrupts, 
DMA interrupts, or communication channel interrupts). 

6.4.2 Operation 

Figure 6-3 shows the unified flow of traps and interrupts. 

For an interrupt, step (1 ) in the figure happens after completion of the last 
instruction that was fetched before completion of the interrupt flush. This 
guarantees later restoration of correct flag values. 

Figure 6-3. Unified Flow of Traps and Interrupts 



Interrupt Received 



Trap Executed 
(TRAPconof or LATcond) 




v 




Return Executed 
(RETIcondor RETIcondD) 



(3) 







•» <3E 






1* CF 
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LATcond (link and trap conditionally) is a delayed instruction that provides 
a single-cycle trap that is very useful for error detection and correction. 
Since LATcond is a delayed instruction, the three instructions following 
LATcond should not modify the GIE or CF status register bits (this could re- 
sult in storing incorrect values of these two bits). 

The RETIcondand BETlcondD instructions manipulate the status flags as 
shown in step (3) in the figure. RETIcondD provides a delayed return from 
a trap or interrupt. Since traps and interrupts are unified, the RETIcond pro- 
vides a return from either. 

In general, you should not directly modify the PGIE or PCF status register 
bits except when putting the status register on a stack for recursive inter- 
rupts or traps. 
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6.5 Interlocked Operations 

One of the most common parallel processing configurations is the sharing 
of global memory by multiple processors. In order for multiple processors 
to access this global memory and share data in a coherent manner, some 
sort of arbitration or handshake is necessary. The TMS320C40 interlocked 
operations meet this requirement for arbitration. More details are given in 
Section 7.7 on page 7-39. Examples in this section show you how inter- 
locked operations can be used to implement: 

□ A busy-waiting loop used to synchronize processors at the software lev- 
el (Example 6-5, page 6-1 5), 

□ A counter shared between cooperative processors defining the number 
of times a task should be done by the processors (Example 6-6 on page 
6-15), 

□ Semaphores used to ease the programming of critical sections 
(Example 6-7 and Example 6-8 on page 6-16). 

The TMS320C40 has five instructions referred to as interlocked operations. 
Through the use of external signals, these instructions provide powerful 
synchronization mechanisms. They also guarantee the integrity of the com- 
munication and result in a high-speed operation. The interlocked-operation 
instruction group is listed in Table 6-2. 

Table 6-2. Interlocked Operations 



Instruction 


Description 


Operation 


LDFI 


Load floating-point value from memory into a register, 
interlocked when external memory accessed 


Signal interlocked 
src — » dst 


LDII 


Load integer from memory into a register, interlocked 
when external memory accessed 


Signal interlocked 
src — > dst 


SIGI 


Load floating-point value from memory into a register, 
interlocked when external memory accessed 


Signal interlocked 
Clear interlock 


STFI 


Store floating-point value from a register to memory, 
interlocked when external memory accessed 


src — » dst 
Clear interlock 


STII 


Store integer from a register to memory, interlocked 
when external memory accessed 


src — > dst 
Clear interlock 



The inte rlocked operations use the global- and local-bus signals, LOCK and 
LLOCK, to reflect a currently executing interlocked operation. This signal is 
active (low) when any of the interlocked instructions in Table 6-2 are ex- 
ecuting. 
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The external timing for the interlocked loads and stores is the same as for 
standard load and stores. You can extend the interlocked loads a nd sto res 
like st andard accesses by using the appropriate ready signal (RDYx or 
LRDYx). 

The LD FI and LD II instructions perform the following actions: 

1) Pull (L)LOCK low. 

2) Execute an LDF or LDI instruction. 

3) Extend the read cycle until the appropriate ready signal is received. 
Compl ete the in struction. 

4) Leave (L)LOCK active low until changed by an STFI, STII, or SIGI. 

The read/write op eration is id entical to any other read/write cycle except for 
the special use of (L)LOCK. The sr c operand for LDFI and LDII is always 
a direct or indirect m emory ad dress. (L )LOCK is set to 0 only if the src is lo- 
cated off-chi p (i.e., ST RB or LSTRB, is active). If on-chip memory is ac- 
cessed, then (L)LOCK is not asserted, and the operation is as an LDF or LDI 
from internal memory. 

The STFI and STII instructions perf orm the fol lowing operations: 

1 ) Begin a write cycle. The state of (L)LOCK does not change. If it is low, 
an interlocked operation occurs. If high, the operation is as if an STF or 
STI is performed (not interlocked). 

2) Execute an STF or STI instruction and extend the write cycle until the 
appropriate ready is signal ed. 

3) After the write cycle, bring (L)LOCK inactive (high). 

As in the case for LDFI and LD II, the cfsfof ST FI and STII aff ects (L)LO CK. 
If dst is located off -chip (STRB(0,1 ) or LSTR B(0,1) is ac tive), (L)LOCK is set 
to a 1 . If on-chip memory is accessed, then (L)LOCK is not asserted, and 
the operations are as a STF or STI to internal memory. 

The SIGI instructio n functions as follows: 

1) Pulls (L)LOCK low. 

2) Executes an LDI instruction. 

3) Extends the read cycle until the appropriate ready signal is received. 
Compl etes the in struction. 

4) Brings (L)LOCK back inactive high. 

Interlocked operations can be used to implement a busy-waiting loop, to 
manipulate a multiprocessor counter, to implement a simple semaphore 
mechanism, or to perform synchronization between two TMS320C40s. The 
following examples illustrate the usefulness of the interlocked operations in- 
structions. 
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Example 6-5 shows the implementation of a busy-waiting loop. If location 
LOCK is the interlock for a critical section of code, and a nonzero means the 
lock is busy, the algorithm for a busy-waiting loop can be used as shown. 

Example 6-5. Busy-Waiting Loop 

LDI 1,R0 ; Put 1 in RO 

LI: LDII @LOCK,Rl ; Load lock value into Rl 

STII ROWLOCK ; Set lock value to 1 

BNZ LI ; If lock is not 0, read it again 



Example 6-6. 



Example 6-6 shows how a location COUNT may contain a count of the 
number of times a particular operation needs to be performed. This opera- 
tion may be performed by any processor in the system. If the count is zero, 
the processor waits until it is nonzero before beginning processing. The ex- 
ample also shows the algorithm for modifying COUNT correctly. 

Task Counter Manipulation 



WAIT 



LDI 

LDII 

BZD 

LDNZ 

SUBI 

STII 



0, R0 

@ COUNT, Rl 
WAIT 

1, R0 
R0,R1 
Rl,@COUNT 



Read current value of counter 

If COUNT 0, try again 

If COUNT not zero, decrement it 

Update COUNT 



Figure 6-4 illustrates multiple TMS320C40s sharing global memory and 
using the interlocked instructions as in Example 6-7 and Example 6-8. 



6-15 



Interlocked Operations 
Figure 6-4. Multiple TMS320C40s Sharing Global Memory 
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Example 6-7. Implementation of V(S) 



V; 



LDII @S,R1 
ADDI l f R0 
STII RO f @S 



S + 1 



-> S 



Example 6-8. Implementation of P(S) 



LDI 

LDII 

BZD 

LDNZ 

SUB I 

STII 



0, R0 
@S,R1 
P 

1, R0 
R0,R1 
R1,@S 



; Read semaphore' s current value 
/ If S = 0, go to P and try again 
; If S is not 0, decrement it 

; Update S 
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Sometimes it may be necessary for several processors to access some 
shared data or other common resources. The portion of code that must ac- 
cess the shared data is called a critical section. 

To ease the programming of critical sections, semaphores may be used. 
Semaphores are variables that can take only nonnegative integer values. 
Two primitive, indivisible operations are defined on semaphores (with S be- 
ing a semaphore): 

V(S) : S + 1 -» S 

P (S) : P: if (S == 0) , go to P 
else S - 1 — » S 

Indivisibility of V(S) and P(S) means that when these processes access and 
modify the semaphore S, they are the only processes doing so. 

To enter a critical section, a P operation is performed on a common sema- 
phore, e.g., S (S is initialized to 1). The first processor performing P(S) will 
be able to enter its critical section. All other processors are blocked because 
S has become 0. After leaving its critical section, the processor performs a 
V(S), thus allowing another processor to execute P(S) successfully. 

The TMS320C40 code for V(S) is shown in Example 6-7, and code for P(S) 
is shown in Example 6-8. Compare the code in Example 6-8 to the code 
in Example 6-6. 
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6.6 Reset Operation 

The TMS320C40 supports a nonmaskable external reset signal (RESET), 
which is used to perform system reset. This section discusses the reset op- 
eration. 

At pow erup, the state of the TMS320C40 processor is undefined. You can 
use the RESET signal to place the processor in a known state. This signal 
must be asserted low for 1 0 or more H1 clock cycles to guarantee a system 
reset. H1 is an output clock signal generated by theTMS320C40 (see Chap- 
ter 13 for more information). 

Reset affects the other pins on the device in either a synchronous or 
asynchronous manner. The synchronous reset is gated by the 
TMS320C40s internal clocks. The asynchronous reset directly affects the 
pins, and it is faster than the s ynchron ous reset.Table 6-3 shows the state 
of the TMS320C40S pins after RESET = 0. Each pin is described according 
to whether the pin is reset synchronously or asynchronously. 

Table 6-3. Pin Operation at Reset 



Signal 


Pins 


Type 


Description 


Global Bus External Interface (80 pins) 


D(31- 0) 


32 


von 


Synchronous reset. Placed in high-impedance state. 


DE 




i 


Reset has no effect. 


A(30 - 0) 


31 


O/T 


Synchronous reset. Placed in high-impedance state. 


AE 




i 


Reset has no effect. 


STAT(3-0) 




0 


Synchronous reset. Set to all ones. 


LOCK 




0 


Synchronous reset. Set to one. 


STRBO 




O/T 


Synchronous reset. Set to one. 


R/WO 




O/T 


Synchronous reset. Set to one. 


PAGEO 




O/T 


Synchronous reset. Set to zero. 


RDYO 




I 


Reset has no effect. 


CEO 




I 


Reset has no effect. 


STRB1 




O/T 


Synchronous reset. Set to one. 


R/Wi 




O/T 


Synchronous reset. Set to one. 


PAGE1 




O/T 


Synchronous reset. Set to zero. 


RDY1 




I 


Reset has no effect. 


CE1 




I 


Reset has no effect. 



Table Continued on Next Page 
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Table 6-3. Pin Operation at Reset (Continued) 



Signal 


Pins 


Type 


Description 


Local Bus External Interface (80 pins) 


LD(31-0) 


32 


l/O/T 


Synchronous reset. Placed in high-impedance state. 


LDE 




I 


Reset has no effect. 


LA(30 - 0) 


31 


O/T 


Synchronous reset. Placed in high-impedance state. 


LAE 




I 


Reset has no effect. 


LSTAT(3 - 0) 




0 


Synchronous reset. Set to all ones. 


LLOCK 


1 


0 


Synchronous reset. Set to one. 


LSTRBO 


1 


O/T 


Synchronous reset. Set to one. 


LR/WO 


1 


O/T 


Synchronous reset. Set to one. 


LPAGEO 


1 


O/T 


Synchronous reset. Set to zero. 


LRDYO 




I 


Reset has no effect. 


LCEO 




I 


Reset has no effect. 


LSTRB1 




O/T 


Synchronous reset. Set to one. 


LR/W1 




O/T 


Synchronous reset. Set to one. 


LPAGE1 




O/T 


Synchronous reset. Set to zero. 


LRDY1 




I 


Reset has no effect. 


LCE1 




I 


Reset has no effect. 


Communication Port 0 Interface (12 pins) 


C0D(7- 0) 




I/O 


Synchronous reset. Placed in high-impedance state. 


CREQO 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CACKO 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CSTRBO 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CRDYO 




I/O 


Asynchronous reset. Placed in high-impedance state. 



Table Continued on Next Page 
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Table 6-3. Pin Operation at Reset (Continued) 



Signal 


Pins 


Type 


Description 


Communication Port 1 1nterface (12 pins) 


C1D(7-0) 


8 


I/O 


Synchronous reset. Placed in high-impedance state. 


CREQ1 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CACK1 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CSTRB1 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CRDY1 




I/O 


Asynchronous reset. Placed in high-impedance state. 


Communication Port 2 Interface (12 pins) 


C2D(7- 0) 




I/O 


Synchronous reset. Placed in high-impedance state. 


CREQ2 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CACK2 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CSTRB2 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CRDY2 




I/O 


Asynchronous reset. Placed in high-impedance state. 


Communication Port 3 Interface (12 pins) 


C3D(7- 0) 




I/O 


Synchronous reset. Placed in high-impedance state. 


CREQ3 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CACK3 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CSTRB3 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CRDY3 




I/O 


Asynchronous reset. Placed in high-impedance state. 


Communication Port 4 Interface (12 pins) 


C4D(7- 0) 




I/O 


Synchronous reset. Placed in high-impedance state. 


CREQ4 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CACK4 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CSTRB4 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CRDY4 




I/O 


Asynchronous reset. Placed in high-impedance state. 


Communication Port 5 Interface (12 pins) 


C5D(7- 0) 




I/O 


Synchronous reset. Placed in high-impedance state. 


CREQ5 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CACK5 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CSTRB5 




I/O 


Asynchronous reset. Placed in high-impedance state. 


CRDY5 




I/O 


Asynchronous reset. Placed in high-impedance state. 



Table Concluded on Next Page 
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Table 6-3. Pin Operation at Reset (Concluded) 



Signal 


Pins 


Type 


Description 


Interrupts, I/O Flags, Reset, Timer (12 pins) 


llUr^U — o) 


A 


I/O 
\/\J 


Msyncnronous reset, riaceu in nign-irnpeaance siaie. 


NMI 




I 


Reset has no effect. 


IACK 




I 


Synchronous reset. 


RESET 




I 


RESET input pin 


RESETLOC(1,0) 




I 


Reset has no effect. 


ROMEN 




I 


Reset has no effect. 


TCLKO 




I/O 


Asynchronous reset. Placed in high-impedance state. 


TCLK1 




I/O 


Asynchronous reset. Placed in high-impedance state. 


Clock and Power (4 pins) 


X1 




0 


Reset has no effect. 


X2/CLKIN 




I 


Reset has no effect. 


H1 




I 


Synchronous reset. Will go to its initial state when RESET 
makes a 1 to 0 transition. 


H3 




I 


Synchronous reset. Will go to its initial state when RESET 
makes a 1 to 0 transition. 


Emulation (7 pins) 


TCK 




I 


Reset has no effect. 


TDO 




0 


Reset has no effect. 


TDI 




I 


Reset has no effect. 


TMS 




I 


Reset has no effect. 


TRST 




I 


Reset has no effect. 


EMUO 




I/O 


Undefined. 


EMU1 




I/O 


Undefined. 
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At system reset, the following additional operations are performed: 

□ Timer registers (Section 9.1 0 on page 9-45) are set. 

■ Timer global control register set to zero except that bit DATIN is set 
to the value on pin TCLK. 

■ Timer counter and timer period registers set to zeroes. 

□ Communications port control registers (subsection 8.4.1 on page 8-9) 
set to zeroes. 

□ External memory interface control registers (Section 7.2 on page 7-6) 
are set to 3E39 FFFOh. 

□ DMA channel control register, DMA transfer counter, and DMA auxiliary 
transfer counter (subsection 9.3.1 on page 9-7) are set to zeroes. 

□ The following CPU registers are loaded with zeroes (each described in 
Chapter 3): 

■ ST (CPU status register) 

■ HE (CPU internal interrupt enable register) 

■ II F (interrupt flag register; controls pins IIOF(3-0)) 

■ DIE (DMA internal enable register) 

■ IVTP (interrupt-vector table pointer) 

■ TVTP (trap-vector table pointer) 

□ Then the reset vector is read from its location and loaded into the PC. 
This vector contains the start address of the system reset routine. 

□ Execution begins. Refer to Section 12.1 on page 12-3 for an example 
of a processor initialization routine. 

Multiple TMS320C4Qs driven by the same s ystem c lock may be reset and 
synchronized. When the 1-to-0 transition of RESET occurs, the processor 
is placed on a well-defined internal phase, and all of the TMS320C40s will 
come up on the same internal phase. 
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Interrupts 



The TMS320C40 supports multiple internal and external interrupts, which 
can be used for a variety of applications. This section discusses the opera- 
tion of these interrupts. Additional information regarding internal interrupts 
can be found in Section 8.4 (page 8-8), Section 8.6 (page 8-17), Table 8-1 
(communication ports on page 8-1 0), Section 9.9 (DMA on page 9-40), and 
Section 9.10 (timers on page 9-45). 

The four external interrupts (IIOF0-IIOF3 as shown in Figure 6-6) are en- 
abled at the HE register (subsection 3.1 .9, page 3-1 0). They are synchro- 
nized internally. They are sampled on the falling edge of H1 and passed 
through a series of H1/H3 delays internally. Once synchronized, the inter- 
rupt input will set the corresponding interrupt flag register (I IF) bit if the inter- 
rupt is active. These are the external interrupts and their corresponding in- 
terrupt vectors (the latter shown in Figure 6-6 on page 6-27): 
IIOF Pin & Interrupt 



These interrupts are prioritized in that one is selected over the other if both 
come on the same clock cycle (IIOF0 the highest, IIOF1 next, etc.). When 
an interrupt is taken, the status register ST(GIE) bit is reset to 0, disabling 
any other incoming interrupt (except NMI — nonmaskable interrupt). This 
prevents any other interrupt (IIOFO-3) from assuming program control until 
the S T(GIE) bit is set back to 1 . The NMI (an incoming low on pin AJ5, signal 
NMI) is not masked by the ST(GIE) bit. On a return from an interrupt routine, 
the RETI and RETlcond instructions place the value that is in the ST(PGIE) 
bit into the ST(GIE) bit, returning it to its value before the context switch. 

Even though the NMI is nonmaskable, it is temporarily masked during de- 
layed branches and multicycle CPU operations. NMI is a negative-going, 
edge-triggered, latched interrupt. 

External interrupts can be effectively either edge- or level-triggered, de- 
pending on how the TYPE fields are set in the IIF register (see Table 3-6 
on page 3-1 3). An external interrupt must be held low for at least one H1 /H3 
cycle to be recognized by the TMS320C40. For level-triggered interrupts, 
if the interrupt is held low for between one and two cycles, then only one in- 
terrupt is recognized. If the interrupt is held low two or more cycles, more 
than one interrupt may be recognized, depending on how rapidly interrupts 
are serviced. 



Interrupt 

IIOF0 
IIOF1 
IIOF2 
IIOF3 



Vector Location 



IVTP + 003h 
IVTP + 004h 
IVTP + 005h 
IVTP + 006h 
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6.7.1 Interrupt Control Bits 

When a particular interrupt is processed by the CPU or DMA controller, the 
corresponding interrupt flag bit is cleared by the internal interrupt acknowl- 
e dge s ignal. It should be noted, however, that for level-triggered interrupts, 
if IIOFn is still low when the interrupt acknowledge signal occurs, the inter- 
rupt flag bit will be cleared for only one cycle and then set again because 
IIOFn is stilt low. Accordingly, it is theoretically possible that, depending on 
when the IIF register (described in subs ection 3. 1 . 1 0 on page 3-1 2) is read, 
this bit may be zero even though IIOFn is zero. When the TMS320C40 is 
reset, zero is written to the interrupt flag register, thereby clearing all pend- 
ing interrupts. 

The interrupt flag register bits may be read and written to under software 
contr ol. If, at the IIF register, FUNCx = 0 and TYPEx = 1 , then external pin 
IIOFx can be written to. Writing a 1 to the IIF register FLAGx bit has the same 
effect as an incoming interrupt received on the corresponding pin. In this 
way, all interrupts may be triggered and/or cleared through software. Since 
the interrupt bits also may be read (TYPEx = 0), the interrupt pins may be 
polled in software when an interrupt-driven interface is not required. 

Internal interrupts operate in a similar manner. In the IIF register, the bit cor- 
responding to an internal interrupt (e.g., TINTO, TINT1) may be read and 
written to through software. Writing a 1 sets the interrupt latch, and writing 
a 0 clears it. All internal interrupts are one H1/H3 cycle in length. 

The CPU global interrupt enable bit (GIE), located in the CPU status register 
(ST), controls all CPU interrupts. All DMA interrupts are controlled by the 
DMA enable register bits and the SYNC bits of the DMA channel control reg- 
isters (described in Figure 9-2 and Table 9-1 on page 9-8). The DMA in- 
terrupts are not dependent upon ST(GIE) and are local to the DMA. 

To provide for maximum performance in servicing interrupts , the i nterrupt 
acknowledge (IACK) instruction is provided. IACK drives the IACK pin and 
performs a dummy read. The read is performed from the address specified 
by the IACK instruction operand. When IACK is used, it typically is placed 
in the early portion of an interrupt service routine. For certain applications, 
it may be better suited at the end of the interrupt service routine or be totally 
unnecessary. 

6.7.2 Prioritization and Control 

The prioritization of interrupts is handled by the CPU according to the inter- 
rupt vector table shown in Figure 6-6. Prioritization is according to position 
in the table — those with displacements closest to the I VTP base address 
are higher in priority (i.e., NMI is higher than TINTO, which is higher than 
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IIOFO, etc.). Note that interrupt TINTO is located at IVTP + 2 while the TINT1 
vector is after the communication port and DMA coprocessor interrupts at 
IVTP + 2Bh. 

Prioritization means an interrupt in a higher position in the interrupt vector 
table (Figure 6-6) will be accepted over one in a lower position when both 
are r eceived in the same clock cycle. It does n ot me an, fo r exa mple, that 
IIOF3 must wait until service routines for IIOF2, IIOF1 , and IIOFO are com- 
pleted (when ST(GIE) = 1). 

If the DMA coprocessor is not using interrupts for synchronization of trans- 
fers, it will not be affected by the processing of the CPU interrupts. If the CPU 
is involved in a pipeline conflict (branch, register, or memory), it will not re- 
spond to the interrupts until that conflict is resolved. It is therefore possible 
to interrupt the CPU and DMA coprocessor simultaneously with the same 
or different interrupts and, in effect, synchronize their activities. For exam- 
ple, it may be necessary to cause a high-priority DMA coprocessor transfer 
that avoids but conflicts with the CPU, i.e., makes the DMA coprocessor a 
higher priority than the CPU. This may be accomplished by using an inter- 
rupt that causes the CPU to trap to an interrupt routine that contains an IDLE 
instruction. Then, if the same interrupt is used to synchronize DMA 
coprocessor transfers, the DMA coprocessor transfer counter can be used 
to generate an interrupt and, thus, return control to the CPU following the 
DMA coprocessor transfer. 

Since the DMA coprocessor and CPU share the same set of interrupt flags, 
the DMA coprocessor may clear an interrupt flag before the CPU can re- 
spond to it. For example, if the CPU interrupts are disabled, the DMA 
coprocessor can respond to interrupts and thus clear the associated inter- 
rupt flags. 

Note the following situations: 

□ If there is a delayed branch in the pipeline, interrupts are held pending 
until after the branch. 

□ If the interrupt occurs in the first cycle of the fetch of an instruction, the 
fetched instruction is discarded (not executed), and the address of that 
instruction is pushed to the top of the system stack. 

□ If the interrupt occurs after first cycle of the fetch (in the case of a multi- 
cycle fetch due to wait states), that instruction is executed, and the ad- 
dress of the next instruction to be fetched is pushed to the top of the sys- 
tem stack. 

□ If no program fetch is occurring, then no new fetch is performed. 

After the address of the appropriate instruction has been pushed, the inter- 
rupt vector is fetched and loaded into the PC, and executed continues. 
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Figure 6-5. 


Interrupt-Vector Table (IVT) 














IVTP 


+ 


OOOh 


Reserved 


Note 1 


IVTP 


+ 


01 Dh 


ICFULL4 


IVTP 


+ 


001 h 


NMI 


Note 2 


IVTP 


+ 


01 Eh 


ICRDY4 


IVTP 


+ 


002h 


TINTO 


Note 3 


IVTP 


+ 


01 Fh 


OCRDY4 


IVTP 


+ 


003h 


IIOFO 


V 




IVTP 


+ 


020h 


OCEMPTY4 


IVTP 


+ 


004h 


NOF1 




► Note 4 


IVTP 


+ 


021 h 


ICFULL5 


IVTP 


+ 


005h 


HOF2 




IVTP 


+ 


022h 


ICRDY5 


IVTP 


+ 


006h 


TiOF3 
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+ 


023h 
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+ 
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OCEMPTY5 
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• 
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DMA INTO 
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+ 
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IVTP 


+ 


02Ch 




IVTP 


+ 


01 3h 


OCRDY1 
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+ 
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y 
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+ 
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Reserved 



> Note 5 



> Note 6 



Notes: 1 ) Reserved for the reset vector when IVTP = 0000 OOOOh and RESETLOC(1 ,0) = 0 0 2 or 
when IVTP=08000 OOOOh and RESETLOC(1 ,0) = 1 0 2 . See Table 3-8. 

2) NMI (non-maskable interrupt) is discussed in Section 9.9, page 9-40. 

3) Timer interrupts TINTO and TINT1 are enabled and programmed by the HE register (subection 
3.1.9, page 3- 10) a nd mon itored at the IIF register (subection 3.1.10, page 3-12). 

4) External pins IIOFO — IIOF5 are programmed in the DIE register (subsection 3.1 .8, page 3-8) 
and the IIF register (subection 3.1.10, page 3-12). 

5) The communication port I/O buffers full/ready interrupts are enabled by the DIE and HE re- 
gisters and also discussed in Table 8-1, page 8-10 (OUTPUT LEVEL & INPUT LEVEL bits). 

6) DMA interrupts are enabled at the HE register and DMA channel control register (at bits TCC 
and AUX TCC explained in Table 9-1 on page 9-8). 
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The TMS320C40 allows the CPU and DMA coprocessor to respond to and 
process interrupts in parallel. Figure 6-6 shows interrupt processing flow. 
The interrupts are polled, and the CPU and DMA coprocessor begin pro- 
cessing them. In the interrupt flow pertaining to the CPU (left side of figure), 
the interrupt flag corresponding to the highest priority enabled interrupt is 
cleared, and GIE is set to 0. The CPU completes all fetched instructions. The 
interrupt vector is fetched and loaded into the PC, and the CPU continues 
execution. The DMA coprocessor cycle (right side of figure) is similar to that 
for the CPU. After the pertinent interrupt flag is cleared, the DMA coproces- 
sor proceeds according to the status of the SYNCH bits in the DMA 
coprocessor global control register. 



Figure 6-6. Interrupt Processing 
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External Bus Operation 

A# , *»e>*. «^'*S/ W> < S S •.**** * ******* *****>***. 



The TMS320C40 has two identical 80-pin parallel external interfaces: the 
global memory interface and the local memory interface. Each interface 
has the following features: 

□ separate 80-pin configurations, each with its own 32-bit data bus and 
31 -bit address bus, 

□ single-cycle reads and pipelined writes, 

□ independent enable signals for data, address, and control lines, 

□ bus-request and bus-lock signaling for share memory parallel 
processing, 

□ user-controlled mapping of addresses to either of two sets of indepen- 
dent strobes for different speed memories, 

□ look-ahead bus status signals for defining current and requested bus 
operations for parallel processing arbitration, 

□ selectable wait states (both software- and hardware-controlled), 

□ signals that indicate when memory page boundaries are crossed.This- 
supports 

■ page-mode and static-column decode DRAMs, 

■ high-speed SRAM banks, and 

■ slower-speed memory banks and I/O devices. 

i 1 ' ' 1 • ■ 1 ■ i 

Note: Description Covers Both Interfaces in this Chapter 

This chapter covers both the global memory interf ace and the local memory 
interface. However, only the global memory interface is shown throughout 
this chapter because it is identical in every way to the local memory inter- 
face except that (1 ) they have different positions in the memory map, and 
(2) the control signals for the local memory interface have an additional 
"L" prefix (as described in Figure 7-1 on page 7-3). 
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Principal headings within this chapter: 

Section Page 

7.1 Global (and Local) Memory Interface Control Signals 7-3 

7.2 Memory Interface Control Registers 7-6 

7.3 Use of the Global Memory Interface Registers 7-12 

7.4 Programmable Wait States 7-15 

7.5 Timing 7-17 

7.6 Using Enabled Signals to Control Signal Group 7-38 

7.7 Interlocked-lnstructions Definition and Timing 7-39 

7.8 lACK Timing 7-47 
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7.1 Global (and Local) Memory Interface Control Signals 

As explained in the Note on page 7-1 , this text covers the global memory 
interface control signals; it also applies to the local memory interface control 
signals (with the exceptions stated in the note). 



Figure 7-1. Global and Local Memory Interface Control Signals 



R/WO 



STRBO 
PAGE O 
RDYO 
CEO 

D(31 -0) 
DE 

A(30 - 0) 
AE 

STA T(3-0 ) 
LOCK 

R/W1 



STRB1 
PAGE 1 
RDY1 
CE1 



32 



/ 91 



NOTE: The signals used in this figure are 
for the global memory interface. Howev- 
er, local memory interface signals have 
the same configuration except that an 
additional "L" (for local) prefix is addedfor 
eac h signa l (e.g. R/WO becom es LR/W0, 
and STRBO becomes LSTRB0, etc.). 



As sho wn in Fig ure 7 -1 , the global memory interface has two sets of control 
signals, STRBO and STRB1 . The global memory port control registers 
(Section 7.2 on page 7-6) define which set of registers is active. 
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Table 7-1. Global Memory Interface Control Signals 



Signalt 


Type§ 


Description 


R/W(0,1) 


O/Z 


Specifies memory read (active high) or write (active low) mode. 


CTDQ/n i \ 
o 1 nb(U, l ) 


r\i~7 


Interface access strobe. 




r\i"7 


Memory-page enable signal for STRB(0,1) accesses. 


RDY(0,1) 


I 


Indicates external memory is ready to be accessed. 


CE(0,1) 


I 


Control signal enable for R/Wx, STRBx, and PAGEx signals. When 
high (a one), it places the corresponding R/Wx, STRBx, and PAGEx 
signals in nign-impeuance sxaie ^x==u Tor ucu ana x== i tut ksc i 


DE 


I 


When high (a one), places data lines D31 - 0 in high-impedance state. 


AE 


| 


When high (a one), places address lines A30 - 0 in high-impedance 
state. 


STAT(3 - 0)* 


0 


Four lines to define status or function of the memory port as shown in 
Table 7-2 (next page). 


LOCK* 


0 


Indicates if an interlocked access is underway (0 = access underway; 
1 = access not underway). LOCK is changed only by the interlocked in- 
structions. 



This table applies to both the global memory interface and local memory interface (local memory 
interface signals have an additional "L" prefix). The numbers in parentheses mean that either a 
0 (zero) or a 1 can follow the prefix shown to the left of the p arenthe ses. A zero indicates STRBO 
control signals (shown in Figure 7-1), and a one indicates STRB1 control signals. 
O = output; I = i nput; Z = high impedance (three-stated). 

STAT(3 - 0)and LOCK cannot be placed in the high-impedance state by an external control sig- 
nal. 




Table 7-2 on the next page shows how pins STAT3 to STATO define the cur- 
rent status of the global memory port. For many bus accesses, these signals 
provide information about the access that is about to begin. The code for a 
SIGI instruction read is useful for distinguishing between a SIGI read and 
a LDII or LDFI read. 

The bus idle status code is (bottom of Table 7-2), which simplifies 
modular shared-bus multiprocessor interfaces, because pull-up resistors 
can be used to signal the idle condition when processor cards are not at- 
tached to the shared bus. 
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Table 7-2. Global Memory Port Status for STRBO and STRB1 Accesses 



Value at Pinst 


Status 


STAT3 STAT2 STAT1 STATO 


0 0 0 0 


STRBO access, program read 


0 0 0 1 


STRBO access, data read 


0 0 1 0 


STRBO access, DMA read 


0 0 11 


STRBO access, SIGI (instruction) read 


0 10 0 


Reserved 


0 10 1 


STRBO access, data write 


0 110 


STRBO access, DMA write 


0 1 1 1 


Reserved 




t This table applies to both the global memory interface and local memory 
interface (for local memory interface signals, add an additional "L" prefix 
such as LSTAT3, LSTAT2, etc.). 
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7.2 Memory Interface Control Registers 

As explained in the Note on page 7-1 , this text covers the global memory 
interface control signals; it also applies to the local memory interface control 
signals (with the exceptions stated in the note). 

Figure 7-2 shows the memory map for both the global and local memory 
interface control registers. Each register can be programmed to control its 
respective memory interface by defining: 

□ page sizes for the two strobes, 

□ when strobes are active, 

□ wait states, 

□ other operations that control the memory interface. 

Table 7-3 (on page 7-8) describes the fields in these registers. 

At reset, the binary values shown above each bit in Figure 7-1 are written 
to the global memory interface control register. Values in bits3-0arethe 
values at these bits' respective pins (AE, DE, CE1, and CEO). This reset 
condition has the following effects (for the local and global bus): 

□ STRBO and STRB1 (LSTRBO and LSTRB1) page sizes are set to 
00111 2 (256 words). 

□ STRBO and STRB1 (LSTRBO and LSTRB1) wait states are set to 7 
cycles. 

□ STRBO and STRB1 (LSTRBO and LSTRB1 ) accesses require an exter- 
nal ready signal and an internal ready signal generated by the software 
wait-state generator. 

□ STRBO (LSTRBO) is active for all addresses over the global (local) 
memory interface. 

□ Back-t o-back reads that switch from STRBO to STRB1 (or STRB1 to 
STRBO) result in the insertion of a single cycle between these reads. 

As shown in Figure 7-2, fields STRB1 SWW and STRBO WW are both set 
to 1 1 2 to allow the internal ready sign al to b e generated by RDY wtcnt (on- 
chip wait-state counter) and external RDY. 
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Figure 7-2. Format for the Memory-Interface Control Registers 



00010 OOOOh 
00010 0001 h 

00010 0003h 
00010 0004h 



Global Memory interface Control Register I 



Reserved 



Local Memory Interface Control Register 



I See AtotejH 



N/A 



N/A 



SWITCH 
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29 
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0 
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1 


1 


STRB ACTIVE 




STRB1 PAGESIZE 


STRB0 PAGESIZE 


(Table 7-4, Table 7-4) 




(Table 7-4) 




(Table 7-4) 
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STRB1 WTCNT 


STRB0 WTCNT 


STRB1 SWW 


, STRB0 SWW 


ill 


DE 


CE1 


CEO 
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3 


2 


1 
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RW RW RW 


RW RW 


RW 


RW RW 


RW RW 


R 


R 


R 
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NOTES: 1 . The register cell figure (immediately above) contains global memory interface control 
register mnemonics. However, local memory interface control register mnemonics 
can be visua lized by adding an "L" prefix to each mnemonic in the figure (e.g., LSTRB 
SWW, LCE0, etc.). 

2. The 1 s and 0s above each bit are the binary values written to the register at reset. 
The values at bits 3 - 0 are defined by the values of their respective external pins (AE, 
DE, CE1,and CEO). 

3. These registers are shown in the overall memory map in Figure 3-9 and Figure 3-1 0 
on pages 3-1 9 and 3-20, respectively. 

4. RW - read/write; R = read. 
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Table 7-3. Bit Definitions for Both Memory Interface Control Registers 



Bit No. 


Mnemonict 


Descriptiont 


0 


CEO 


Value of external pin CEO (after it passes through an inter- 
nal synchronizer). The value is not latched. 


1 


CE1 


Value of external pin 611 (after it passes through an inter- 
nal synchronizer). The value is not latched. 


2 


DE 


Value of external pin DE (after it passes through an internal 
synchronizer^ The value is not latched 


3 


AE 


Value of external pin Al (after it passes through an internal 
synchronizer). The value is not latched. 


4-5 


STRBO SWW 


Software wait states for STRBO access. In conjunction 
with STRBO WTCNT, this field defines the mode of wait-state 
generation. Actual wait states are explained in Section 7.4 
and in Table 7-7 on page 7-16. 


6-7 


STRB1 SWW 


Software wait states for STRB1 access. In conjunction 
with STRB1 WTCNT, this field defines the mode of wait-state 
generation. Actual wait states are explained in Section 7.4 
and in Table 7-7 on page 7-1 6. 


8-10 


STRBO WTCNT 


Software wait-state count for STRBO accesses. Specifies 
the number of cycles to use when software wait states are 
active. Three-bit range is from OOO2 (zero) to 1112 (seven). 


11 -13 


STRB1 WTCNT 


Software wait-state count for STRB1 accesses. Specifies 
the number of cycles to use when software wait states are 
active. Three-bit range is from OOO2 (zero) to 111 2 (seven). 


14-18 


STRBO PAGESIZE 


Page size for STRBO accesses. Specifies number of MSBs 
of the address to use to define the bank size for STRBO ac- 
cesses. See range table in Table 7—4 on page 7-9. 


19-23 


STRB1 PAGESIZE 


Page size for STRB1 accesses. Specifies number of MSBs 
of the address to use to define the bank size for STRB1 ac- 
cesses. See range table in Table 7-4 on page 7-9. 


24-28 


STRB ACTIVE 


Specifies address ranges over which STRBOt and STRBIt 
are active See ranaes in Table 7—5 on oaae 7-10 for STRB 

W*l V-/ Uv I 1 V • V w 1 V-A 1 1 W Will 1 CAfc-/ 1 w # 1 1 pSt"*^ W # 1 \S 1 \^ 1 ill Imt 

ACTIVE and Table 7-6 on page 7-11 for LSTRB ACTIVE. 


29 


STRB SWITCH 


Inserts a single cycle between back-to-back reads that 
switch from STRBO to STRB1 (or vice versa). 

When a 1, insert cycle. 

When a 0, don 'f insert cycle. 


30-31 


Reserved 


Read as zeroes. 



t Mnemonics used are for the global memory interfac e control regi ster. For the local memory interface control 
register, add the prefix "L" to each mnemonic (e.g., LCEO, LCE1 , LSTRB1 , etc.). The description remains 
the same for the local memory interface control register. 
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Table 7-4. Page Size as Defined by STRBO/1 PAGESIZE Bits t 



CTDDy 
O 1 nDX 

PAGESIZE 
Field* 


Cvtarnol AHrli*oce Due 

cxiernai Muuress dus 
Bits Defining the 
Current Page 


external Auuress dus 
Bits Defining 
Address on a Page 


Page Size 
(32-Bit Wds) 


00000- 
00110 


Reserved 


Reserved 


Reserved 


00111 


30 — 8 


7 — 0 


28 = 256 


01000 


30 — 9 


8 — 0 


29 = 512 


01001 


30—10 


9 — 0 


2 10 = 1K 


01010 


30— 1 1 


10 — 0 


2 11 =2K 


01011 


30—12 


11—0 


2 12 = 4K 


01100 


30—13 


12 — 0 


2 13 = 8K 


01101 


30—14 


13 — 0 


2 14 = 16K 


01110 


30—15 


14 — 0 


2 15 = 32K 


01111 


30—16 


15 — 0 


2 16 = 64K 


10000 


30 — 17 


16 — 0 


2 1 ' = 128K 


10001 


30 — 18 


17 — 0 


2 10 = 256K 


10010 


30 — 19 


18 — 0 


2 iy = 512K 


10011 


30 — 20 


19 — 0 


= 1 M 


10100 


30 — 21 


20 — 0 


2 21 = 2M 


10101 


30 — 22 


21—0 


2 22 = 4M 


10110§ 


30 — 23 


22 — 0 


2 23 = 8M 


10111 


30 — 24 


23 — 0 


2 24 = 16M 


11000 


30 — 25 


24 — 0 


2 25 = 32M 


11001 


30 — 26 


25 — 0 


2 26 = 64M 


11010 


30 — 27 


26 — 0 


2 27 = 1 28M 


11011 


30 — 28 


27 — 0 


2 28 = 256M 


11100 


30 — 29 


28 — 0 


229 = 512M 


11101 


30 


29 — 0 


230 = 1G 


11110 


None 


30 — 0 


231 = 2G 


11111 


Reserved 


Reserved 


Reserved 



t Mnemonics used are for the global memory interface control register. For the local memory interface control 
register, add the prefix "L" to each mnemonic (e.g., LSTRB0 PAGESIZE, LSTRB1 PAGESIZE, etc.). The de- 
scription remains the same for the local memory interface control register. 

* The "x" in STRBx means that the data in the columns are for STRB0 or STRB1 as well as for LSTRB0 and 
LSTRB1 , as explained in the note above. 

§ An STRBx PAGESIZE field of 1 01 1 02 is depicted in Figure 7-4 on page 7-1 3. 
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Table 7-5. Address Ranges Specified by STRB ACTIVE Bits t 



STRB ACTIVE 
Field 


STRBO Active 
Address Range 


STRBO Active 
Address 
Range Size 


STRB1 Active 
Address Range 


00000- 
01110 


Reserved 


Reserved 


Reserved 


01111 


8000 0000 — 8000 FFFF 


216 = 64K 


8001 0000 — FFFF FFFF 


10000 


8000 0000 — 8001 FFFF 


2 i7 = 128K 


8002 0000 — FFFF FFFF 


10001 


8000 0000 — 8003 FFFF 


218 = 256K 


8004 0000 — FFFF FFFF 


10010 


8000 0000 — 8007 FFFF 


219 = 512K 


8008 0000 — FFFF FFFF 


10011 


8000 0000 — 800F FFFF 


220 = 1M 


8010 0000 — FFFF FFFF 


10100 


8000 0000 — 801 F FFFF 


221 = 2M 


8020 0000 — FFFF FFFF 


10101 


8000 0000 — 803F FFFF 


222 „ 4M 


8040 0000 — FFFF FFFF 


10110 


8000 0000 — 807F FFFF 


223 = 8M 


8080 0000 — FFFF FFFF 


10111 


8000 0000 — 80FF FFFF 


224=16M 


8100 0000 — FFFF FFFF 


11000 


8000 0000 — 81 FF FFFF 


225 = 32M 


8200 0000 — FFFF FFFF 


11001 


8000 0000 — 83FF FFFF 


226 = 64M 


8400 0000 — FFFF FFFF 


11010 


8000 0000 — 87FF FFFF 


227 = 128M 


8800 0000 — FFFF FFFF 


11011 


8000 0000 — 8FFF FFFF 


2 28 = 256M 


9000 0000 — FFFF FFFF 


11100 


8000 0000 — 9FFF FFFF 


229 = 512M 


AOOO 0000 — FFFF FFFF 


11101 


8000 0000 — BFFF FFFF 


230 = 1G 


COOO 0000 — FFFFFFFF 


11110 


8000 0000 — FFFFFFFF 


231 = 2G 


None 


11111 


Reserved 


Reserved 


Reserved 



t Address ranges specified by the LSTRB ACTIVE bits are listed in Table 7-6. 
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Table 7-6. Address Ranges Specified by LSTRB ACTIVE B/fct 



1 QTRR AO. 

TIVE Field 


1 CTRRO Artii/A 

Address Range 


LSTRB0 Active 
Address 
Range Size 


LSTRR1 Active 

Address Range 


00000- 
01110 


Reserved 


Reserved 


Reserved 


01111 


0000 0000 — 0000FFFF 


2 1 6 - 64K 


0001 0000 — 7FFFFFFF 


10000 


0000 0000 — 0001FFFF 


2 17 = 128K 


0002 0000 — 7FFFFFFF 


10001 


0000 0000 — 0003FFFF 


2 1 » = 256K 


0004 0000 — 7FFFFFFF 


10010 


0000 0000 — 0007FFFF 


2 19 = 512K 


0008 0000 — 7FFFFFFF 


10011 


0000 0000 — 000FFFFF 


2 2 0= 1M 


0010 0000 — 7FFFFFFF 


10100 


0000 0000 — 001FFFFF 


221 = 2 M 


0020 0000 — 7FFFFFFF 


10101 


0000 0000 — 003FFFFF 


222 = 4M 


0040 0000 — 7FFFFFFF 


10110 


0000 0000 — 007FFFFF 


223 „ 8M 


0080 0000 — 7FFFFFFF 


10111 


0000 0000 — 00FFFFFF 


224=16M 


0100 0000 — 7FFFFFFF 


11000 


0000 0000 — 01FFFFFF 


225 , 32M 


0200 0000 — 7FFFFFFF 


11001 


0000 0000 — 03FFFFFF 


226 _ 64M 


0400 0000 — 7FFFFFFF 


11010 


0000 0000 — 07FFFFFF 


227 = 128M 


0800 0000 — 7FFFFFFF 


11011 


0000 0000 — 0FFFFFFF 


228 = 256M 


1000 0000 — 7FFFFFFF 


11100 


0000 0000 — 1 FFFFFFF 


229 = 512M 


2000 0000 — 7FFFFFFF 


11101 


0000 0000 — 3FFFFFFF 


230 = 10 


4000 0000 — 7FFFFFFF 


11110 


0000 0000 — 7FFFFFFF 


231 = 2G 


None 


11111 


Reserved 


Reserved 


Reserved 



t Address ranges specified by the STRB ACTIVE bits are listed in Table 7-5. 
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7.3 Use of the Global Memory Interface Registers 

7.3.1 Mapping Addresses to Strobes 

Figure 7-3 demonstrates the relationship between the STRB ACTIVE bits 
(defi ned in Tabl e 7-3, page 7-8) and the address ranges over wh ich sig - 
nals STRBO and STRB1 are active. Note that the address ranges of STRB x 
and LS TRBx also govern the ranges of their associated signals RDYx, 
LRDYx, R/Wx, LR/Wx, PAGEx, LPAGEx, etc. (where x = 1 or 0). 

Figure 7-3. Effects of STRB ACTIVE on Global Memory Bus Memory Map 



8000 OOOOh 



■i 



FFFF FFFFh 



2G Words 



8000 OOOOh 



803F FFFFh 
8040 OOOOh 



STRBO 
active 



FFFF FFFFh 




(a) STRB ACTIVE = 11110 2 



(b) STRB ACTIVE = 101 01 2 



NOTE: Shown here are two examples for the global memory map. The entire 'C40 
memory map (local an d global) is shown in Figure 3-9 on page 3-19. Note that 
the highest address for LSTRB1 (local bus) is 7FFF FFFFh. 



Example (a) of Figure 7-3 shows the reset condition (STRB ACTIVE = 
IIHO2). In this case, signal STRBO is active over the entire address range 
of the global memory bus (see Table 7-4 for lookup table of STRB ACTIVE). 
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Example (b) of Figure 7-3 shows the glo bal mem ory bus memory map 
when STRB ACTIVE = 1 01 01 2- In this case, S TRBO is active from address- 
es 8000 OOOOh - 803F FFFFh, and STRB1 is active from addresses 
80400000h - FFFF FFFFh (as shown in Table 7-4 for an STRB ACTIVE of 
101 01 2 ). 



7.3.2 Page Size Operation 



Figure 7-4. STRBx PAGESIZE Fields Example 



External Address 
Bus Bits Defining ^ 
p- the Current Page — *p- 



External Address 
Bus Bits Defining 
Address on a Page 



I 



'30 



23 1 22 



NOTE: This figure represents an STRBx PAGESIZE field value of 1011 02 (as 
shown in Table 7-4 on page 7-9). 



The TMS320C40 external interface allows you to specify (using a 31 -bit ad- 1 
dress) independent page sizes for the different sets of external strobes. This 
capability, shown in the example in Figure 7-4, gives you a great deal of 
flexibility in the design of external high-speed, high-density memory sys- 
tems and the use of slower external peripheral devices. 

The STRBO PAGESIZE and STRB1 PAGESIZE fields in the memory inter- 
face control register (shown in Figure 7-2 on page 7-7) work in the same 
manner to specify the page size for the corresponding strobe. Table 7-4 
(page 7-9) illustrates the relationship between the PAGESIZE field and the 
bits of the address used to define the current page and the resulting page 
size. Page size begins at 256 words (with external address-bus bits 7-0 
defining the address on a page, and ranges up to 2G words with external 
address bus bits 30-0 defining the location on a page. The example in 
Figure 7-4 shows how a pagesize field value of 1 01 1 02 is translated into bits 
30 - 23 defining the current page and bits 22 - 0 defining address on a page. 
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Changing from one page to another oauses a cycle to be inserted in the ex- 
ternal access sequence in order for external logic to reconfigure itself appro- 
priately. The memory inter face co ntrol logic keeps track of the address used 
for the last access for each S TRB. W hen an access begins, the PAGE signal 
corresponding to the active STRB goes inactive (high) if the access is to a 
new page. The PAGEO and PAGE1 signals are independent of one another, 
each having its own page-size logic. 

At reset, the page-control logic is initialized so that the extra cycle is inserted 
for the first access to the two strobe interfaces. 

The local memory interface has a similar set of control registers. 
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7.4 Programmable Wait States 

Control wait-state generation by manipulating memory-mapped control reg- 
isters associated with both the global and local interfaces. Use the STRBx 
WTCNT field to load an internal timer, and use the STRBx SWW field to se- 
lect one of the following four modes of wait-state generation: 

□ External RDY 

□ WTCNT-generated RDY wtcnt 

□ Logical-AND of RDY and RDY wtcnt 

□ Logical-OR of RDY and RDY wtcnt 

Application of wait states and ready are covered in Section 13.4 on page 
13-27. 

The four modes are used to g enera te the internal ready signal, RDYj nt , that 
controls accesse s. As long as RDYj nt = 1 , the current external access is ex- 
tended. When RDYj n t 555 0, the current access completes. Since the use 
of programmable wait states for both external interfaces is identical, only the 
global bus interface is described in the following paragraphs. 

RDYwtcnt is an internally generated ready signal. When an external access 
is begun, the value in WTCNT is loaded into a counter. WTCNT may be any 
value from 0 through 7. The counter is decremented every H1/H3 clock 
cycle until it becomes 0. Once the counter is set t o 0, it r emains set to 0 until 
the next acc ess. W hile the counter is nonzero, RDYwtcnt = 1 . While the 
counter is 0, RDY wtcn t = 0. 

Table 7-7 is the truth table for each value of SWW, showing the different val- 
ues at RDY, RDYwtcnt, and RDYint- 
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Table 7-7. Wait-State Generation for Each Value of SWW 



SWW 
Value 


RDY 


RDY wtcnt 


RDY int 


RDYi nt 


00 
no 

00 
00 


0 
u 

1 
1 


0 

H 
1 

0 
1 


0 

U 

1 
1 


nUTint is aepenuent yniy upon riL/T. 
RDY wtcnt is ignored. 


01 

V I 

01 
01 


0 
n 

U 

1 
1 


0 
0 

1 


0 
0 

1 


RriV« x io HonAnHont onlv/ unnn 
r\u i io utjpt?riut?rii uruy upun 

RDY wtcnt . RDY is ignored. 


10 
10 
10 
10 


0 
n 

w 

1 
1 


0 
1 

1 

0 

1 


0 

n 
\j 

0 

1 


RDYj nt is the logical-OR (electrical- 
AND, since these signals are low 
true) of RDY and RDY wtcnt . 


11 
11 
11 
11 


0 
0 

1 
1 


0 

1 

0 

1 


0 

1 
1 
1 


RDYj nt is the logical-AND (electrical- 
OR, since these signals are low true) 
of RDY and RDY wtcnt . 
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7.5 Timing 



Figure 7-5. STRB and RDY Timing 



STRB 




RDY 



\ 



/ 



Note: Dotted lines emphasize the relationships between signals that is further 
explained in the accompaning text below. 

Throughout this chapter, no d istinctio n is made b etween global and local 
interface signals and between STRBO and STRB1 , except for clarity. 

As shown in Figure 7-5, STRB changes on the falling edge of H1 , and RDY 
is sampled on the falling edge of H1 . Throughout the other timing diagrams 
in this section, the following general rules apply to the logical timing of the 
parallel external interfaces: 

1 ) Changes of R/W are always framed by S TRB. 

2) A page boundary crossing for a particular STRB results in the corre- 
sponding PAGE signal going high for one cycle. 

3) R/W tr ansitions are always on an H1 rising. 

4) STRB transitions are always on an H1 falling. 

5) RDY is always sampled on an H1 falling. 

6) On a read, data is always sampled on an H1 falling. 

7) On a write, data is always driven out on H1 falling. 

8) On a write, data is always stopped from being driven on H1 rising. 

9) Following a read, the status, and PAGE signal change on H1 falling. The 
address changes on H1 's falling edge. 

10) Following a write, status and PAGE signals change on Hi falling; the 
address changes on H1 rising. 
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1 1 ) The fetch of an interrupt vector over an external interface is identified by 
the status signals for that interface (STA T or LS TAT ) as a da ta read. 

12) The interlocked operation status signals (LOCK and LLOCK) have the 
same timing as the STAT a nd LST AT status signals, respectively. 

13) Any time PAGE goes high, STRB goes high. 

Figure 7-6 illustrates a read, read, write sequence. This figur es assu mes 
that all three accesses are to the same page and that they are STRB1 ac- 
cesses. This timing diagram illustrates that back-to-back reads to the same 
page are single -cycle accesses. When the transition from a read toa write 
is done, STRB goes high for one cycle in order to frame the R/W signal 
changing. 



Figure 7-6. Read Same Page, Read Same Page, Write Same Page Sequence 
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Figure 7-7 shows that STRB goes high between back-to-back writes. As in 
Figure 7-6, STRB goes high between a write and a read, and it frames the 
R/W transition. 



Figure 7-7. Write Same Page, Write Same Page, Read Same Page Sequence 

H1 I I II II II 1 
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)— — ( 


i i , i i , i i i 




; , X , X_ , ^ 


i i i i i i i i i . i 


; (STRB1 write) X (STRB1 write) X (STRB1 read) 



X 



Note: Strobe and Ready Further Defined 

Strobe and ready are discussed from the application viewpoint in Sections 
13.3 (page 13-20) and 13.4 (page 13-27) respectively. 
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Figure 7-8 shows that going from one page to another on bacMo-back 
reads causes an extra cycle to be inse rted, a nd the transition is signaled by 
PAGE going high for one cycle. Also, STRB1 goes high for one cycle. 

Figure 7-8. Read Same Page, Read Different Page, Read Same Page Sequence 
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Figure 7-9 shows that on back-to-back writes, when a page switch oc- 
curs, it is signaled with PAGE going high for one cycle. 



Figure 7-9. Write Same Page, Write Different Page, Write Same Page Sequence 
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Figure 7-10. Write Same Page, Read Different Page, Write Different Page Sequence 
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Figure 7- / 1: Read Different Page, Read Different Page, Write Same Page Sequence 
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Figure 7-12. Write Different Page, Write Different Page, Read Same Page Sequence 
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Figure 7- 13. Read Same Page, Write Different Page, Read Different Page Sequence 
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Figure 7-1 4 to Figure 7-1 8 illustrate the idle bus cycles. Idle bus cycle tim- 
ing is si milar to read cycle timin g. The primary differences are that no data 
is read, STRB is held high, and RDY is ignored. 



Figure 7-14. Read Same Page, Idle One Cycle, Read Same Page Sequence 

H, J" 



R/WO 



STRBO 



RDYO 



PAGEO 



R/W1 



STRB1 



RDY1 



PAGE1 



D31 - DO 



A30 - AO 



X 



CD 



X 



\ 



X 



1 



Z3G 



STAT3-STATq (STRB1 read) ^ idle ^ (STRB1 read) ^ ■ 



7-26 



External Bus Operation 



Timing 



Figure 7-15. Write Same Page, Idle One Cycle, Write Different Page Sequence 
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Figure 7-16. Idle, Read Different Page, Idle Sequence 
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Figure 7- 1 7 Idle, Write Same Page, Idle Sequence 
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Figure 7-18. Write Different or Same Page, Idle, Idle Sequence 
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Figure 7-19 illustrates an STRB1 read followed by an STRBO read when 
STRB SWITCH = 0. This mode allows the reads to be back to back, with no 
cycles inserted between the reads when the back-to-back reads are activat- 
ing different strobes. 



Figure 7-19. Read Same Page on STRB1, Read Same Page on STRBO, Read Same Page on STRB1 
Sequence When STRB SWITCH = 0 
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Figure 7-20 illustrates an STRB1 read followed by an STRBO read when 
STRB SWITCH = 1 . In this mode, a cycle is inserted between back-to-back 
reads that activate different strobes. If your system memory configuration 
is such that bus conflicts can occur during back-to-back reads on different 
strobes, this mode provides one cycle between these strobe transitions to 
avoid the bus conflicts. 



Figure 7-20. Read Same Page on ST RB 1, Read Same Page on STRBO, Read 
Same Page on STRB1 Sequence When STRB SWITCH = / 
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Figure 7-21 is similar to Figure 7-19 except that the secon d read using 
STRB1 is to a different page than the first read (using STRB1). 



Figure 7-21. Read Same Page on STRB 1, Read Same Page on STRBO, Read Different Page on 
STRB1 Sequence When STRB SWITCH = 0 
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Figure 7-22 is similar to Figure 7-20 except that the secon d read using 
STRB1 is to a different page than the first read (using STRB1 ). 



Figure 7-22. Read Same Page on STRB1, Read Same Page on STRBO, Read Different Page on 
STRB1 Sequence When STRB SWITCH « / 
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Figure 7-23. Write Same Page on STRB1, Write Same Page on STRBO, Read Same Page on STRB1 
Sequence 
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Figure 7-24. Read With One Wait State 
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Figure 7-25. Write With One Wait State 
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Figure 7-26. Using Enabled Signals to Put Signal Groups in a High-Impedance State 

I 



H1 



signal group 



signal group enable 




Figure 7-26 shows the use of an enable signal to control the corresponding 
signal group. For example, signal DE controls the global external-interface 
signals D31-D0. The enable signals are unsynchronized inputs that turn off 
the corresponding output buffers. Some time period (shown by period (1) 
in Figure 7-26) after the enable signal goes high, the corresponding signal 
group goes into a high-impedance state. Then, some time period after the 
enable signal goes low (period (2) in Figure 7-26), the signal group comes 
out of a high-impedance state. Of course, if the signal group is already in 
a high-impedance state before the enable signal goes high, the group will 
come out of the high-impedance state (when the enable signal goes low ) 
only if the signal group is in a state requiring it to do so. For example, a data 
bus that was not being driven will be driven after being enabled if an access 
is pending for the data bus. 

If you intend to use internally generated wait states, be certain that 
nothing inappropriate occurs when a bus is disabled. This is because 
it is possible to have a bus in a high-impedance state and with internally gen- 
erated wait states. In this case, data that is written will not be seen external- 
ly, and data that is read will be whatever value is sampled on the high-impe- 
dance bus. 
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7.7 Interlocked-lnstructions Definition and Bus Timing 

The LOCK and LLOCK bus-lock signals are manipulated by the interlocked 
instruc tion s LDII, L DFI, ,STII, STFI, and SIGI. As noted, the timing of the 
LOCK and LLOCK pins is the same as pins STAT(3 — 0)andLSTAT(3 — 0). 
Instructions LDII, LDFI, ,STII, STFI, and SIGI manipulate the bus-lock sig- 
nals only when an external memory access is made. 

Except forthe manipulation of the bus-locked signals, the LDII (Load Integer 
Interlocked) and LDFI (Load Floating Point Interlocked) instructions are like 
(in all ways) the comparable LDI (Load Integer) and LDF (Load Floating 
Point) in terms of the operation performed and the bus operation. LDII and 
LDFI perform as follows: 

1 ) The read cycle is begun, and the appropriate bus-lock signal is placed in 
the active-low state. 

2) The read cycle is extended until the appropriate ready signal is active. 

3) Throughout the read cycle and to its conclusion, the bus-lock signal is 
kept in an active-low state until modified by a subsequent STII, STFI, or 
SIGI instruction. 

Figure 7-27 is an example of an LDII or LDFI external access. 
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Figure 7-27. LDII or LDFI External Access 
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Except for manipulation of the bus-locked signals, the STII (Store Integer 
Interlocked) and STFI (Store Floating Point, Interlocked) instructions are 
the same as the comparable STI (Store Integer) and STF(Store Floating 
Point) in terms of execution and bus operation. STII and STFI operate as 
follows: 

1 ) The store cycle is begun, and the appropriate bus-lock signal is kept in 
its current state. In most cases, the interlocked store is preceded by an 
interlocked load, and the bus-lock signal is kept low. Otherwise, the bus- 
lock signal is high, and the interlocked store looks like a not-interlocked 
store. 

2) The store cycle is extend ed unti l the appropriate ready signal is active. 

3) When the corresponding STRB goes high at the end of the store cycle 
(the corresponding STAT(0-3) also changes at this time), the corre- 
sponding bus-lock signal also goes high. 

An STII or STFI instruction to internal memory has no effect on the bus-lock 
signals. 

Figure 7-28 is an example of an STII or STFI external access following the 
previous interlocked load (shown in Figure 7-27) and an idle cycle. This is 
the timing for an interlocked load/interlocked store sequence. 
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Figure 7-28. STII or STFI External Access 
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The SIGI instruction (signal interlocked) is similar to the LDII and LDIF in- 
structions. The SIGI functions as follows: 

1 ) The read cycle is begun, and the appropriate bus-lock signal is placed in 
the active-low state. 

2) The read cycle is extended until the appropriate ready signal is active. 

3) When the read operation is complete, the bus-lock signal is brought 
high with the same timing as the status signals changing. 

Figure 7-29 is an example of a SIGI external access. 
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Figure 7-29. SIGI External Access Timing 
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The SIGI instruction can be used in a variety of ways. In some applications, 
you may wish to externally modify semaphores, perhaps with special-pur- 
pose logic. If so, SIGI can be used to perform a single-cycle interlocked ac- 
cess of the semaphore. The SIGI instruction can also be used simply to 
perform an external read and to signal that a particular point in your code 
has been reached. 



Figure 7-30 illustrates timing for SIGI if the LOCK signal is alre ady low . This 
could happen when a SIGI follows an LDII in struction. Since LOCK is al- 
ready low, the only effect SIGI has on LOCK is to bring it high. 
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Figure 7-30. SIGI When LOCK Is Already Low 
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7.8 IACK Timing 

The IACK pin is affected by the IACK (int errupt acknowledge) instruction. 
The timing of the pin is similar to that of the LOCK pin when used by th eSIGI 
instruction. In all resp ects (timing, extension with wait states, etc.) the IACK 
beh aves li ke a LOCK or STAT signal. The only difference is that there is only 
one IACK pin. 

The timing for the IACK pin is shown in Figure 7-31 . Like the interlocked in- 
structions, the IACK instruction affects IACK only for an external access. 
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Figure 7-31. IACK Timing 



H1 



R/WO 



STRBO 



RDYO 



PAGEO 



R/W1 



r 



STRB1 



RDY1 



PAGE1 



D31 - DO 



A30- AO 



x 



i 



ZZXi 



(STRB1 read) )((STRB1 IACK ready j 




IACK 

IACK external access 



7-48 



External Bus Operation 



Chapter 8 

Communication Ports 




This chapter provides technical information for the communication ports of 
the TMS320C40 digital signal processor (DSP). This chapter is divided into 



the following major sections: 

Section Page 

8.1 Introduction 8-2 

8.2 Communication Port Features 8-3 

8.3 Operational Overview 8-5 

8.4 Communication Port Memory Map and Registers 8-8 

■ Communication Port Control Register (CPCRs) 8-9 

■ Input Port Register 8-9 

■ Output Port Register 8-9 

8.5 Communication Port Operation 8-12 

■ Port Arbitration Units (PAUs) 8-12 

■ Module Reset 8-14 

■ Halting of Input and Output FIFOs 8-15 

8.6 Coordinating Communication Port Activity 

with CPU and DMA Coprocessors 8-17 

8.7 Communication Port Timing 8-18 

■ Timing Table and Figures 8-18 

■ Synchronizer Timing 8-31 
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8.1 Introduction 

A parallel processor system supports optimum system performance by dis- 
tributing tasks between two or more processors. This sharing of tasks be- 
tween two or more TMS320C40 DSPs requires that each be able to pass the 
results of its work to another; passing of results enables both DSPs to con- 
tinue working. Processor-to-processor communication is critical in multipro- 
cessor-system design. 

High-performance multiprocessing requires rapid transfer of data between 
processors. To ensure this rapid transfer of data, the TMS320C40 provides 
the following: 

□ Shared memory — The 'C40 global- and local-memory interfaces 
enable easy construction of efficient multiprocessor-based shared 
memory systems. 

□ High-speed communication ports — The 'C40's six high-speed bidi- 
rectional communication ports provide rapid processor-to-processor 
communication on six dedicated communication interfaces. 

Although memory sharing has advantages in some applications, a shared 
bus seriously limits processor communication bandwidth for many applica- 
tions. Using the high-speed communication ports eliminates this obstacle. 
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8.2 Communication Port Features 

Key features of each TMS320C40 communication port: 

□ 1 60-megabit per second (5-megaword per second) bidirectional data 
transfer operations (at 40-ns cycle time) 

□ direct (glueless) processor-to-processor communication via eight data 
lines and four control lines 

□ buffering of all data transfers, both input and output 

□ automatic arbitration and handshaking to ensure communication syn- 
chronization 

□ synchronization between the CPU or direct-memory access (DMA) 
coprocessor and the six communication ports via internal interrupts and 
internal ready signals 

□ support of a wide variety of multiprocessor architectures, including 
rings, trees, hypercubes, bidirectional pipelines, two-dimensional 
Euclidean grids, hexagonal grids, and three-dimensional grids. 
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Figure 8-1. Communication Port Block Diagram 
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8.3 Operational Overview 

The 'C40 contains six identical high-speed communication ports, each of 
which provides a bidirectional communication interface to an external de- 
vice. Figure 8-1 shows the internal architecture of a single communication 
port. Each port contains the following components: 

□ Input FIFO channel — provides an 8-level, 32-bit wide first-in-first-out 
(FIFO) input buffer that isolates the 'C40 from the port communication 
data bus and buffers data received from an external device via the bus. 

□ Output FIFO channel — provides an 8-level, 32-bit wide FIFO output 
buffer that Isolates the 'C40 from the port communication data bus and 
buffers data to be sent to an external device via the bus. 

□ Port arbitration unit (PAU) — handles the arbitration tasks associated 
with the movement of data between a 'C40 and an external device via 
the port communication data bus. Signals arbitrated and controlled by 
the PAU are shown in Figure 8-2. The PAU is described in detail in sub- 
section 8.5.1 on page 8-12. 

□ Communication port control register (CPCR) — allows you to con- 
trol the communication port functions and data transfer operations be- 
tween a 'C40 and an external device via the communication port data 
bus. 



Figure 8-2. TMS320C40 Communication-Port Interface-Connection Example 
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Figure 8-2 is an example of two 'C40 DSPs connected via their communi- 
cation ports. This simple communication interface consists of the following 
bidirectional control and data lines: 



□ CREQx — communication port token request. A 'C40 activates this sig- 
nal to request the use of the communication port data bus. 



8-5 



Operational Overview 



□ CACKx — communication port token acknowledge. A 'C40 activates 
this signal to relin quish o wnership of the communication port data bus 
upon receiving a CREQx from another 'C40. 

□ CSTRBx — communication port strobe. A sending 'C40 activates this 
signal to indicate that it has placed valid data on the communication port 
data bus. 

□ CRDYx— communication port ready. A receiving 'C40 activates this 
signal to indicate that it has received data via the communication port 
data bus. 

□ CxD(7-0) — communication port data bus. This bus carries data 
bidirectionally between two 'C40s or between a 'C40 and some other 
device. 

Figure 8-2 shows two 'C40s connected via their communication ports. The 
communication port data bus, CD(7-0), and its associated control signals 
transfer data in either direction between 'C40s A and B. The PAUs in the two 
'C40s cooperate to generate the signals and control sequences necessary 
to ensure orderly data transfers at the highest possible rate. To avoid con- 
flicts on the bus, these PAUs arbitrate bus ownership, allowing only one 
DSP to transmit at any given time. Either of the PAUs can relinquish bus 
ownership when the other DSP has data to send. 

Signals CREQx and CACKx handle the handshaking arbitration between 
the two DSPs: 

1 ) The PAU that does not own the data bus (CxD(7-0)) activates CREQx 
to request bus ownership. 

2) The PAU owning the bus then activates CACKx to acknowledge the re- 
quest and relinquish bus ownership to the requesting PAU. 

3) In this manner, these signals transfer a token (or priority) from one PAU 
to another, and the PAU receiving the token gains ownership of the bus. 

During a data transfer operation: 

1) The CPU or DMA coprocessor of the sending DSP writes data to the 
output FIFO (of a communication port) via a memory-mapped address 
(listed in Figure 8-3). 

2) The c ommuni cation port then places the data on CxD(7-0) and acti- 
vates CSTRBx to signal the receiving communication port that the bus 
contains valid data. 

3) Upon receivin g the da ta in its input FIFO, the receiving communication 
port activates CRDYx to indicate that it has received the data. 

4) The CPU or DMA coprocessor of the receiving DSP may then read the 
data from the input FIFO via a memory-mapped address (listed in 
Figure 8-3). 
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Each of the input and output FIFOs can buffer a maximum of eight 32-bit 
words. 

Buffering provided by the input and output FIFOs is very important. This 
buffering allows for a high degree of decoupling of computation and commu- 
nication overhead. When 'C40s A and B are connected via their communi- 
cation ports, the effective length of the FIFOs becomes 1 6 levels. This is be- 
cause the output path from A to B is the concatenation of the eight levels 
of the output FIFO of A with the eight levels of the input FIFO of B. This also 
applies for the output path from B to A. 
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8.4 Communication Port Memory Map and Registers 

Figure 8-3 shows the memory map for the 'C40 communication port control 
registers (CPCRs) and their associated input FIFOs and output FIFOs. The 
lowest three addresses of each port's 1 6-address block are mapped to a 
CPCR and its associated input and output FIFOs. Fields (bits) within a 
CPCR are shown in Figure 8-4. 



Figure 8-3. Communication Port Memory Map 
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For example, the addresses for communication port 0 point to (see 
Figure 8-3): 

□ address 0001 0 0040h: CPCR 0 

□ address 00010 0041 h: input port register 0, FIFO level 0 

□ address 0001 0 0042h : output port register 0, FIFO level 7 

□ address range 00010 0043h-00010 004Fh: reserved. 

8.4.1 Communication Port Control Registers (CPCRs) 

Figure 8-4 shows the format of a TMS320C40 CPCR, which contains con- 
trol and status bits for its associated communication port. Table 8-1 lists the 
CPCR bits and fields and describes their functions. Figure 8-3 lists the 
memory locations of the CPCRs. 

If an output port that is full is written to, the peripheral bus interface latches 
the word written. On subsequent accesses to the peripheral bus, a not ready 
is given. This condition goes away when an empty position appears in the 
output FIFO. This results in the peripheral bus input latch being transferred 
to the output buffer at the communication port. 

8.4.2 Input Port Register 

This read-only register contains the contents of position 0 of the input FIFO, 
theoldestvalueintheFIFO. If this register is written to, its contents remain 
unchanged. 

8.4.3 Output Port Register 

This write-only register interfaces to position 7 of the output FIFO (level 7 — 
the newest value in the FIFO). If this register is read, its contents remain 
unchanged, and the value read is undefined (garbage). 
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Figure 8-4. Communication Port Control Register (CPCR) 




Table 8-1. CPCR Bit Functions 



Bit 
Nos. 


Field Name 


Function 


0-1 


Reserved 


Undefined 


2 


PORT DIR 


Port Direction. Bit determines the direction of data transfer 
operations for the communication port. 

• PORT DIR = 0: port is in the output mode 

• PORT DIR = 1 : port is in the input mode. 


3 


ICH 


Input Channel Halt. 

• Write a 1 to ICH to halt the input channel. When the input 
channel is halted, PORT DIR is set to zero. 

• Set ICH to 0 when the input channel is to be unhalted; 
otherwise, the input channel cannot signal externally when it is 
ready to receive. 


4 


OCH 


Output Channel Halt. 

• Write a 1 to this bit to immediately halt the output channel. 
However, the communication port is still able to accept a token 
request from the input channel. 

• Set this bit to 0 to allow the output channel to transfer data. 



(Table concluded on next page) 
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Table 8-1. CPCR Bit Functions (Concluded) 



Dlt 

Nos. 


Field Name 


Function 






Output FIFO Level. Contents of this 4-bit field: 

• OOOO2 (0): indicates an empty output FIFO. 

• 0001 2 (1): through 01112 (7): indicates the number of 
full positions in the output FIFO. 

• 1111 2 (15): indicates a full output FIFO 


5-8 


OUTPUT LEVEL 


An empty output buffer (OUTPUT LEVEL = 0000 2 ) causes an un- 
latched, positive level-triggered interrupt (OCEMPTY = 1) to be 
sent to the CPU When the CPU or DMA coorocessor writes to 

WW III I V/ LI 1 W V-/ 1 Vm/ ■ V V 1 1 w II 111 w V/ 1 ^/ wM 1 V 1 f A WW^SI \J V^W WW w/| WWII IW W fcW ; 

the empty output FIFO, OCEMPTY is set to 0, and it remains in 
that state until the buffer is again empty. An output FIFO with one 
or more empty levels also causes an unlatched, positive level- 
triggered interrupt (OCRDY = 1) to be sent to the CPU and the 
DMA coprocessor. This condition causes a READY/NOT READY 
signal to ue generaxea wnen me uru or uivia coprocessor at- 
tempts to write to the output FIFO. 






Input FIFO level. Contents of this 4-bit field: 

• OOOO2 (0): indicates an empty input FIFO. 

• 0001 2 (1): through 011 1 2 (7): indicates the number of full 
positions in the input FIFO. 

• 1111 2 (15): indicates a full input FIFO. 


9-12 


INPUT LEVEL 


A full input FIFO (INPUT LEVEL = 1111 2 ) causes an unlatched, 
positive level-triggered interrupt (ICFULL = 1) to be sent to the 
CPU. When the CPU or DMA coprocessor reads from the full input 
FIFO, ICFULL is set to 0 and remains in that state until the FIFO 
is again full. An input FIFO with one or more full levels also causes 
an unlatched, positive level-triggered interrupt (ICRDY = 1) to be 
sent to the CPU and the DMA coprocessor. This condition causes 
a READY/NOT READY signal to be generated when the CPU or 
DMA coprocessor attempts to read from the output FIFO. 


13-31 


Reserved 


Undefined 
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8.5 Communication Port Operation 

8.5.1 Port Arbitration Units (PAUs) 

The PAU is responsible for arbitrating between two devices to determine 
which device has possession of the communication port data bus at any 
given time. This arbitration allows the bus ownership token to be passed 
back and forth between two devices connected via their communication 
ports. During this arbitration process, the PAU is in one of the four states 
listed in Table 8-2. 



Table 8-2. PAU State Definitions 



Summary 


PAU 
State 


PAU Status 


!♦ rAUfi8$tOK6fl 
(PQRTDJFU0) 
& Channel not in use 


00 


The PAU currently has possession of the bus owner- 
ship token, and its associated communication chan- 
nel is not in use. Under this condition, the PORT DIR 
bit of the associated CPCR is 0 (output). 


1 . PAU does not fcave token 

{PORT DIB -1). 
& Token not requested by PAU 

{OUTPUT LEVEL «0). 


01 


The PAU currently does not have possession of the 
bus ownership token and has not requested the to- 
ken. Under this condition, the PORT DIR bit equals 1 
(input), and the OUTPUT LEVEL field equals 0 
(empty output FIFO). 


token 
. (PORT CNR* 0), 
& Channel i$ in use {OUTPUT 

LEVEL ^0). 


1 0 


The PAU currently has possession of the bus owner- 
ship token, and its associated communication chan- 
nel is in use. Under this condition, the PORT DIR bit 
equals 0 (output), and the OUTPUT LEVELfield does 
not equal 0). 


1 „ PAU does not have token 

(PORT DtR** 1). 
£ Token requested by PAU 

{OUTPUT LtSVEL # 0). 


1 1 


The PAU currently does not have possession of the 
bus ownership token but has requested the token. 
Under this condition, the PORT DIR bit equals 1 (in- 
put), and the OUTPUT LEVELfield does not equal 0. 



Figure 8-5 shows the state diagram and controlling equations for the PAU 
state transitions. The figure also includes comments describing how the 
state transitions correspond to various system-level processes. 

To place data on the communication port data bus, the PAU must arbitrate 
between: 

□ on-chip requests to output data on the communication channel data bus 
(CD(7-0)) 

□ external requests received via the CREQ line 

This arbitration is accomplished by passing the bus-ownership token be- 
tween PAUs associated with different communication ports. The PAU con- 
taining the token has ownership of the communication port data bus. At sys- 
tem reset, half of the communication channels associated with a particular 
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'C40 have token ownership (communication ports 0,1,2), and the other half 
(comm unic ation p orts 3, 4, 5) do not. This token passing is done via the 
CREQ and CACK lines. 



Figure 5-5. Communication Port Arbitration Unit State Diagram 




To help understand the port arbitration scheme represented in Figure 8-5, 
consider a data transfer operation from 'C40 A to 'C40 B. The transfer be- 
gins with PAU A in state 00 2 and PAU B in state 01 2 . If PAU A receives a 
request (BUSRQ = 1 ) from its output buffer to use the communication port 
data bus, it allows the output buffer to transmit one word immediately and 
enter state 1 02. After the output buffer transmits one word, it removes the 
bus request (BUSRQ = 0), and PAU A returns to state OO2. 

If PAU B receives a request from its output buffer to use the bus, it activates 
CREQ to request the token from PAU A. PAU A dete cts this request via the 
state variable TOKRQ and then activates the CACK line to transfer the bus 
ownership token to PAU B. PAU B then generates an internal bus acknowl- 
edge (BUSACK) to indicate that it has gained bus ownership. As a result of 
this token transfer operation, PAU A enters state 01 2, and PAU B enters 
state 1 02. 
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Because a PAU always returns to state OO2 after transmitting a single word, 
token passing can be accomplished by 'C40s A and B alternately transmit- 
ting single words. This process provides a fair means of bus arbitration that 
prevents either of the output buffers (As or B's) from being continually 
blocked. 



If an input buffer becomes full, it will not activate CRDY at the beginning of 
the transmission of the first byte that would overflow the buffer. This condi- 
tion prevents data transfer operations in either direction until the situation is 
resolved. This can be done by reading data from the full input buffer. 

8.5.2 Module Reset 

At system reset, the input and output channels both assume an emp ty state , 
causin g all valu es in the inp ut and output buffers to be lost. The CREQ, 
CACK, CSTRB, and CRDY signals assume an inactive (high) state and 
CxD(7-0) enters its tristate mode (see Figure 8-14 and Figure 8-15 on 
page 8-30). These signals remain in these states as long as system reset is 
active and, following system reset, the value placed on CxD(7-0) by the 
communication port that is configured for output is undefined. 

At system reset, communication ports 0, 1, and 2 assume the following 
states: 

□ PAU is reset to state OO2: The PAU has possession of the bus owner- 
ship token, and the channel is not in use. 

□ ICRDY = 0: The input channel is empty and is not ready to be read from. 

□ ICH = 0: The input channel is not in its halted state. 

□ OCRDY = 1 : The output channel is not full and is ready to be written 
to. 

□ OCH = 0: The output channel is not in its halted state. 

□ PORT DIR = 0: The communication port is configured for output opera- 
tion. 

□ INPUT LEVEL = 0: The input channel is empty. 

□ OUTPUT LEVEL = 0: The output channel is empty. 

At system reset, communication ports 3, 4, and 5 assume the following 
states: 

□ PAU reset to state 01 2. The PAU does not have possession of the bus 
ownership token, and the token is not requested. 

□ ICRDY = 0: The input channel is empty and is not ready to be read from. 
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□ ICH = 0: The input channel is not in its halted state. 

□ OCRDY = 1 : The output channel is not full and is ready to be written 
to. 

□ OCH = 0: The output channel is not in its halted state. 

□ PORT DIR = 1 : The communication port is configured for input opera- 
tion. 

□ INPUT LEVEL = 0: The input channel is empty. 

□ OUTPUT LEVEL = 0: The output channel is empty. 

Based on these reset conditions, ports 0, 1 , and 2 of one DSP should be 
connected to ports 3, 4, and 5 of the other. 



Connect Ports to Opposite Input/Output Reset Mode 

At resets ports 0, 1 , and 2 are configured as output ports (PORT DIR 
= 0} and ports 3, 4, and 5 are configured as input ports (PORT DIR 
= 1). When connecting ports, connect each to a port that would be 
in the opposite direction at reset (any of 0, 1 , or 2 connected to any 
of 3, 4, or 5). 



8.5.3 Halting of Input and Output FIFOs 

The halting of the input and output FIFOs of a communication channel is 
controlled by the ICH and OCH bits (input-channel and output-channel halt 
bits) of the communication port control register (Figure 8-4 on page 8-1 0). 
The goal of input FIFO halting is to halt the input FIFO as soon as possible, 
but without the loss of data being input. A summary of the halt/unhalted con- 
ditions is provided in Table 8-3 on page 8-16. 

When the input FIFO is halted, it will not signal a ready when the first incom- 
ing byte is received. At that point, the data transfer is frozen until the input 
FIFO is unhalted or a system reset occurs. If the input FIFO is unhalted later, 
the transfer will continue without any loss of data. 

A communication port with an FIFO that is either halted or is full and inactive 
will not acknowledge a token request. This assures that the communication 
port's output channel remains open. 

If a communication port's input FIFO is halted during a token request from 
the communication port to which it is connected, then the token request is 
acknowledged before halting. 
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Table 8-3. Summary of Input and Output FIFO Halting 



Halted/Unhalted 


If the Port Has Token 


If the Port Does Not Have Token 


Input halted 
Output unhalted 


a. Won't release token 

b. Will transmit data 


a. Won't signal ready when first byte is 
received (transfer frozen) 

b. If halted after first byte is received, it 
will receive rest of word (will signal 
ready and then halt the input) 


Input unhalted 
Output halted 


a. Won't transmit data 

b. If halted after first byte 
sent, will complete word 
transfer and then halt the 
output 

c. will release xoKen 


a. Will receive data 

b. Will not request token 


Input halted 
Output halted 


a. Won't release token 

b. Won't transmit data 

c. If halted after first byte 
sent, will complete word 
transfer and then halt 
the output 


a. Won't signal ready when first byte is 
received (transfer frozen) 

b. If halted after first byte received, it will 
receive rest of word and then halt the 
input 

c. Will not request token 



Output FIFO halting is analogous to input FIFO halting. Assume that DSP A 
output FIFO has OCH = 1 . Then the output FIFO will be halted, based upon 
its current state. 

□ If communication port A does not have the token, the output FIFO is 
halted, and no request is made for the token. 

□ If communication port A has the token and is currently transmitting a 
word, then after the word is transmitted, no new transfers will be begun. 

□ If communication port A has the token and the input FIFO is not halted 
and the output FIFO is halted, then it will transfer the token when re- 
quested by communication port B. 

□ If communication port A has the token and the input FIFO is halted and 
the output FIFO is halted, then it will not transfer the token when re- 
quested by communication port B. 

□ When coming out of the halted state, if the communication channel still 
has the token, it may transmit data if necessary. If it needs the token, 
it will arbitrate for the token as usual. 
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Coordinating Communication Port Activity With CPU and 
DMA Coprocessors 

The communication ports support several principle modes of synchroniza- 
tion: 

□ a ready/not ready signal that can halt CPU and DMA accesses to a 
communication port 

□ interrupts that can be used to signal the CPU and DMA 

The most basic synchronization mechanism is based on a ready/not-ready 
signal. If the DMA or CPU attempt to read an empty input FIFO or write to a 
full output FIFO, a not-ready signal is returned and the DMA or CPU contin- 
ues to read or write until a ready signal is received. The ready signal for the 
output channel is OCRDY (output channel ready), which is also an interrupt 
signal. The ready signal for the input channel is ICRDY (input channel 
ready), which is also an interrupt signal. 

Interrupts are often a useful form of synchronization. Each communication 
port generates four different interrupt signals, as listed below (interrupt traps 
for these are shown in Figure 3-8 on page 3-16): 

□ ICFULL (input channel full) 

□ ICRDY (input channel ready) 

□ OCRDY (output channel ready 

□ OCEMPTY (output channel empty) 

The CPU can respond to all four of these interrupt signals. The DMA 
coprocessor can respond to the ICRDY and OCRDY interrupt signals. 
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8.7 Communication Port Timing 

In order to accurately describe the timing of the operation of the communica- 
tion ports, it is important to differentiate between the internal signals applied 
to the pins and the external signal seen. All signals are buffered and can be 
placed in a high-impedance state. See Figure 8-6. 

In this discussion, internal signals applied to a buffer are identified by suf- 
fixes: 

□ a suffix 'a* for processor A (for example, CSTRB a) 

□ a suffix 'b' for processor B (for example, CSTRBb) 

□ a suffix 'ah' for the external signal be tween the two connected commu- 
nication ports (for example CSTRBab and CREQab) 

□ a suffix followed by a single quote for th e value that the processor sees 
by sampling the input pad (for example CPTRa') 



Figure 8-6. Signal-Naming Example 
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Buffer 
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8.7.1 Timing Table and Figures 

Table 8-4 and the timing figures that follow depict timing sequences in com- 
munication between TMS320C40s using their communication ports. 
Table 8-4 lists handshaking and communication during this intercommuni- 
cation. Steps in the table are shown by numbers in the figures. Events 1 
through 36 in the table compose a token request and token transfer se- 
quence. 
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Figure 8-7 Token Transfer Sequence (page 8-23). 

1 ) At the start, the communication port on processor A has the token and is 
idle. 

2) The communication port on processor B requests the token and, after 
receiving the token, transfers a word, one byte at a time: 

a) the first byte is bits 7-0 

b) the second is bits 1 5-8 

c) the third is bits 23-1 6 

d) the fourth is 31-24 

3) Once a token-requesting communication port receives the token re- 
quest acknowledge, it will always transmit a word. 

Figure 8-8 End of Token Transfer Sequence Followed by a Word Transfer 
and the Beginning of a Second Word Transfer (page 8-24). 

Figure 8-9 End of a Word Transfer Followed by a Word Transfer (page 
8-25). 

Figure 8-1 0 End of a Word Transfer Followed by an Idle State and Token 
Transfer (page 8-26) . 

1) The communication port data bus becomes idle because the output 
FIFO on processor B is empty. 

2) The communication port on processor A requests the token, which is 
then transferred to it by the communication port on processor B. 

Figure 8-11 End of a Word Transfer Followed by an Overlapping Token 
Transfer (page 8-27). 

1 ) As shown, the token request is received by the communication port on 
processor B. 

2) The communication port on processor B sees the ready signal for the 
last byte of the word being transmitted. 

3) Then the communication port releases the token. 

4) However, the communication port will not release the token if the token 
request is received by the processor port B after the processor port B 
sees the ready signal for the last byte of the word being transmitted. 

5) If the communication port on processor B does not have another word in 
the output FIFO to transmit, it will release the token. 

Figure 8-12 End of the Transfer of the Last Word in an Output FIFO Fol- 
lowed by an Idle Condition Until Another Word Is Available to Be Transferred 
(page 8-28). This begins with a word transfer followed by an idle state due to 
an empty output FIFO. Then a word is written to the output FIFO and trans- 
ferred. 

Figure 8-13 End of a Word Transfer Followed by a Not Ready Due to the 
Input FIFO Becoming Full, Continuing Once the Input FIFO Is No Longer 
Full (page 8-29). This shows the use of the ready line to generate wait 
states. 
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1 ) In this case, a word is transferred that fills the input FIFO of the commu- 
nication port of processor A. 

2) At the beginning of transmission of the next word, the communication 
port on processor A does not signal that it is ready until the input FIFO is 
no longer full. 



Table 8-4. Handshaking Events in Communication Port Intercommunication 



t Event 
No. 


Description 


1 


B requests the token by bringing CREQb low. 


2 


A sees the token request when CREQa' goes low. 


3 


After a type 1 delay from CREQa' falling, A acknowledges the request by bringing 
CACKa low. 


4 


B sees the acknowledgement from A when CACKb' goes low. 


5 


A switches CRDYa from tristate to high on the first H1 rising after CACKa falling. 


6 


A tristates CaD(7-0) on the first H1 rising after CACKa falling. 


7 


B switches CSTRBb from tristate to high after CACKb' falling. 


8 


B brings CREQb high after a type 1 delay from CACKb' falling. 


9 


A sees CREQa' go high. 


10 


A brings CACKa high after CREQa' goes high. 


11 


A tristates CSTRBa after CACKa goes high. 


12 


A tristates CACKa after CREQa' goes high and after CACKa goes high. 


13 


A switches CREQa from tristate to high after CREQa' goes high. 


14 


B tristates CREQb after CREQb goes high. 


15 


B switches CACKb from tristate to high after CREQb goes high. 


16 


B tristates CRDYb on H1 rising after CREQb goes high. 


17 


B drives the first byte onto CbD(7-0) on H1 rising after CREQb goes high. 


18 


A sees the first byte on Ca'D(7-0). 


19 


B brings CSTRBb low on the second H1 rising after CREQb rising. 


20 


A sees CSTRBa' go low, signaling valid data. 


21 


A reads the data and brings CRDYa low 


22 


B sees CRDYb' go low, signaling data has been read, 


23 


B drives the second byte on CbD(7-0) after CRDYb' goes low. 


24 


A sees the second byte on Ca'D(7-0). 



t Event No. corresponds to numbers in the timing diagrams that follow. 
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laDie 0-4. 


Handshaking Events in Communication Port Intercommunication (Continued) 


t Event 
No. 


Description 


25 


B brings CSTRBb high after CRDYb' goes low. 


26 


A sees CSTRBa' go high. 


27 


A brings CRDYa high after CSTRBa' goes high. 


28 


B sees CRDYb' go high. 


29 


B brings CSTRBb low after CRDYb' goes high. 


30 


A sees CSTRBa' go low, signaling valid data. 


31 


A reads the data and brings CRDYa low. 


32 


B sees CRDYb' go low, signaling data has been read. 


33 


B drives the third byte on CbD(7-0) after CRDYb* goes low. 


34 


A sees the third byte on CaD(7-0). 


35 


B brings CSTRBb high after CRDYb' goes low. 


36 


A sees CSTRBa' go high. 


37 


A brings CRDYa high after CSTRBa' goes high. 


38 


B sees CRDYb go high. 


39 


B brings CSTRBb low after CRDYb' goes high. 


40 


A sees CSTRBa' go low, signaling valid data. 


41 


A reads the data and brings CRDYa low. 


42 


B sees CRDYb' go low, signaling data has been read. 


43 


B drives the fourth byte on CbD(7-0) after CRDYb' goes low. | 


44 


A sees the fourth byte on CaD(7-0). 


45 


B brings CSTRBb high after CRDYb' goes low. 


46 


A sees CSTRBa' go high. 


47 


A brings CRDYa high after CSTRBa' goes high. 


48 


B sees CRDYb' go high. 


49 


B brings CSTRBb low after CRDYb' goes high. 


50 


A sees CSTRBa' go low, signaling valid data. 


51 


A reads the data and brings CRDYa low. 


52 


B sees CRDYb' go low, signaling data has been read. 


53 


B brings CSTRBb high after CRDYb' goes low. 



t Event No. corresponds to numbers in the timing diagrams that follow. 



Table Concluded on Next Page 
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Table 8-4. Handshaking Events in Communication Port Intercommunication (Concluded) 



t Event 
No. 


Description 


54 


A sees CSTRBa' go high. 


55 


A brings CRDYa high after CSTRBa' goes high. 


56 


B sees CRDYb' go high. 


57 


B drives the first byte of the next word onto CbD(7-0) after a type 2 delay from 
CRDYb' falling (52). 


58 


A sees the first byte of the next word on CaD(7-0). 


59 


B lowers CSTRBb after a type 2 delay from CRDYb' falling. 



t Event No. corresponds to numbers in the timing diagrams that follow. 

These events are identified by event number in the following figures that de- 
scribe the communication port timing. 
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Figure 8-7. Token Transfer Sequence 
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Figure 8-8. End of Token Transfer Sequence Followed by a Word Transfer and the Beginning of a 
Second Word Transfer 
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Figure 8-9. End of a Word Transfer Followed by a Word Transfer 
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Figure 8-10. End of a Word Transfer Followed by an Idle State and Token Transfer 
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Figure 8-11 End of a Word Transfer Followed by an Overlapping Token Transfer 
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NOTE: Events U — B3 are complements to the description in Table 8-4 on page 8-20 (i.e.Jf "a" is in the 
description, substitute "b" and vice versa, — CDa' becomes CDb'; CDb* becomes CBa'). 
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Figure 8-12. End of the Transfer of the Last Word in an Output FIFO Followed by an Idle Condition 
Until Another Word Is Available to Be Transferred 
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Figure 8- 13. End of a Word Transfer Followed by a Not Ready Due to the Input FIFO Becoming Full, 
Continuing Once the Input FIFO Is No Longer Full 
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Figure 8-14 illustrates the state of the signals of a communication port ini- 
tialized by a reset as an output p ort (por ts 0, 1 , and 2 are configured as out- 
put po rts at re set) . For this case, CREQ and CRDY are in a high-impedance 
state. CACK and CSTRB are high, and undefined values are on CD(7-0)- 

Figure 8-14. Post-Reset State for an Output Port 
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Figure 8-1 5 illustrates the state of the signals of a communication port ini- 
tialized by a reset as an input port (po rts 3, 4, and 5 are co nfigure d as input 
ports at reset). For this case, CREQ and CRDY are high. CACK, CSTRB, 
and CD(7-0) are all in a high-impedance state. 

Figure 8-15. Post-Reset State for an Input Port 
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8.7.2 Synchronizer Timing 

The synchronizers used in the port arbitration unit are of two types. Type- 
one synchronizers cause delays that vary from 1 to 2 machine clocks from 
the receiving of an input on a pin until the response on output pin (ignoring 
analog delays). Type-two synchronizer delays range from 1.5 to 2.5 ma- 
chine clocks delay. 

Type-one synchronizers recognize an input when H1 is high, then pass it 
through an H3-hlgh/H1~high series of delays. The response is at the start 
of the following H3 high. 

The minimum type-one synchronizer delay of one machine clock will occur 
when the input changes just before H1 goes low. This delay is shown in 
Figure 8-1 6. 

The maximum type-one synchronizer delay of two machine clocks will oc- 
cur when the input changes just after H1 goes low. This delay is shown in 
Figure 8-17. 



Figure 8-16. Type-One Synchronizer Minimum Delay 




Figure 8-17. Type-One Synchronizer Maximum Delay 
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Type-two synchronizers first recognize an input when H1 is high, then 
pass it through an H3-high/H1-high/H3-high series of delays. The re- 
sponse is at the start of the following H1 high. 

The minimum type-two synchronizer delay of 1 .5 machine clocks occurs 
when the input changes just before H1 goes low. This delay is shown in 
Figure 8-18- 

The maximum type-two synchronizer delay of 2.5 machine clocks occurs 
when the input changes just after H1 goes low. This delay is shown in 
Figure 8-19. 

Using these two types of synchronizers, the synchronizer delays for the 
communication port signals are tabulated in Table 8-5. 



Figure 8-18. Type-Two Synchronizer Minimum Delay 
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Figure 8-19. Type-Two Synchronizer Maximum Delay 
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Table 8-5. Communication Port Signals and Synchronizer Delays 



Input Signal to Output Signal 


Delay 
Type 


Min. Delay 
(clock cycles) 


Max. Delay 
(clock cycles) 


CREQI to CACKi 


One 


1 


2 


CACKi to CREQT 


One 


1 


2 


CRDYi to CD valid for a new word 


Two 


1.5 


2.5 


CACKi to CSTRB active 




0.5 


1.5 


CRDYi to CSTRBi between 
back-to-back word transfers 


Two 


1.5 


2.5 
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DMA Coprocessor and C40 Timers 

%t46b*}*#¥* *ttt*tA< ft w tfff* tiAttf fttttMt tt'*ft St ttfttist tt tttttt t tt ttt'tt t ttttt t t. t. <.t v tt t t ' > s 't t t ' 1 1» tt tf /ft ssmtys vv " v^^SvftwwS&MdSflft 

tttttAttttS. v%4%-M <rt %m w wwi n a w v. tttvvtttt. tvtut t vt.ttt.vtt ttttt t. tttt sttt. s t. \ s t t t\ <v. s SsS' s .v <«• w«. % % w% % % %s s .• \ .■ •. v. ■. v. ■. % v.*. t.t. vtt. v. vt. %!*• a»<w* <<%<*».*#« 



This chapter provides technical information for two important TMS320C40 
fC40) functions: the direct memory access (DMA) coprocessor and the tim- 
ers. Both are on-chip parts of the 'C40 digital signal processor (DSP). The 
first nine major sections of this chapter cover the DMA coprocessor; the last 



section covers the timers. 
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9.3 DMA Coprocessor Registers 9-7 

■ DMA Channel Control Register 9-7 

■ DMA Channel Address and Index Registers 9-16 

■ DMA Channel Transfer-Counter Register 9-18 

■ DMA Channel Link-Pointer Register 9-19 

9.4 DMA Coprocessor Channels in Unified and Split Mode 9-20 

9.5 DMA Coprocessor Internal Priority Schemes 9-22 

9.6 CPU and DMA Coprocessor Arbitration 9-27 

9.7 Data Transfer Modes 9-28 

9.8 Autoinitialization 9-31 

9.9 DMA Coprocessor and Interrupts 9-40 

9.10 TMS320C40 Timers 9-45 

i 1 

Note: DMA Programming Examples in Chapter 12 

Besides the descriptions of DMA operation in this section, programming 
examples and explanations are provided in Chapter 12. 
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The primary benefit of the DMA coprocessor is to maximize sustained CPU 
performance by completely alleviating the CPU of burdensome I/O duties. 

The DMA coprocessor supports six DMA channels that perform transfers 
to and from anywhere in the processor's memory map. For example, trans- 
fers can be made to/from on-chip memory, off-chip memory, and any of the 
six on-chip communication ports. The DMA coprocessor can automatically 
reinitialize its registers via linked lists stored in memory, allowing the DMA 
to run continuously without any intervention by the central processor unit 
(CPU). The DMA coprocessor can build up circular buffers in memory and 
perform linear and bit-reversed addressing. 

The DMA coprocessor provides you with an unprecedented level of per- 
formance and flexibility for a DSP on-chip DMA coprocessor. The key fea- 
tures of the 'C40 DMA coprocessor are: 

□ six DMA channels for memory-to-memory transfers under unified 
mode; a special split mode supporting 1 2 DMA channels for communi- 
cation port to/from memory transfers 

□ autoinitialization of DMA channel control registers, via linked lists stored 
in memory, at the start of a block transfer 

□ concurrent CPU and DMA coprocessor operation with DMA transfers 
at the same rate as the CPU (supported by separate internal DMA ad- 
dress and data buses) 

□ source and destination address registers with variable indices allowing 
stepping through matrices by row or column 

□ bit-reversed addressing for FFTs 

□ synchronization of data transfers via external and internal interrupts 
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9.2 DMA Coprocessor Functional Description 

The TMS320C40 DMA coprocessor improves data transfer rates in sys- 
tems that must perform: 

□ memory to memory transfers 

□ data transfers from an I/O device to memory 

□ data transfers from memory to an I/O device 

□ transfers of data between the on-chip communication ports and 
memory. 

□ data transfer of a single value to a block of memory for memory fill and 
initialization. 

The DMA coprocessor can transfer data in a linear fashion or in a bit-rev- 
ersed fashion for FFT applications; it can transfer matrix data in a row or col- 
umn fashion. 

The DMA coprocessor is a self-programming device that allows data trans- 
fers to occur without any intervention from the CPU. This allows data to be 
moved onto and off of the 'C40 without any CPU distraction. The result is 
a processor which has a concurrent I/O rate that can keep up with the CPU's 
high computation rate. The address map of the DMA coprocessor registers 
is shown in Figure 9-1 . The major registers of the DMA coprocessor are: 

□ control register 

□ source register 

□ source index register 

□ destination register 

□ destination index register 

□ transfer counter register 

□ link pointer register 

Subsections that describe these are listed in Figure 9-2 and in Section 9.3. 

The DMA coprocessor has dedicated on-chip DMA address and DMA data 
buses. All accesses made by the six DMA channels are arbitrated in the 
DMA coprocessor and take place over these dedicated buses. The DMA 
channels can run constantly or may be triggered by an external or internal 
interrupt, including an interrupt generated by the on-chip timers and com- 
munication ports. 
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Figure 9-1. DMA Coprocessor Memory Map 



0010 OOAOh 



0010 00A8h 
0010 00A9h 



0010 OOAFh 
0010 OOBOh 



0010 00B8h 
0010 00B9h 



0010 OOCOh 



0010 00C8h 
0010 00C9h 



0010 OOCFh 
0010 OODOh 



0010 00D8h 
0010 00D9h 



DMA Ch. 0 
Registers 
(See exploded 
view) 



T 

DMA Ch 0 



^ Reserved ^ 



DMACh.1 
Registers 
(See exploded 
view) 



DMA Ch 1 



^ Reserved ^ 

T 

DMA Ch 2 



EXPLODED VIEW OF EACH CHANNEL 
REGISTER 



DMA Ch. 2 
Registers 
{See exploded 
view) 



^ Reserved ^ 



DMA Ch. 3 
Registers 
{See exploded 
view) 



T 

DMA Ch 3 



010 OOzOh 


DMA Ch. Control Register x 


010 00z1h 


Source Address x 


010 00z2h 


Source Address Index x 


010 00z3h 


Transfer Counter x 


010 00z4h 


Destination Address x 


010 00z5h 


Destination Address Index x 


010 00z6h 


Link Pointer x 


010 00z7h 


Auxiliary Transfer Counter x 


010 00z8h 


Auxiliary Link Pointer x 



I 



DMA 
Ch. 
x 
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^ Reserved ^ 



0010 OODFh 
0010 OOEOh 



0010 00E8h 
0010 00E9h 



0010 OOEFh 
0010 OOFOh 



0010 00F8h 
0010 00F9h 



0010 OOFFh 



DMA Ch. 4 
Registers 
(See exploded 
view) 



DMA Ch 4 



Reserved 



DMA Ch.5 
Registers 
{See exploded 
view) 



x = channel number (e.g., a 1 for all 
registers in channel 1 , a 2 for 
all registers in channel 2, etc.). 

z = corresponding hexadecimal digit 
for channel address (e.g., subst 
tute an "A" for DMA channel 0; "B" 
for DMA channel 1 , etc.). 



The subsections describing 
these registers are listed in 
Figure 9-2 and in Section 9.3 
on page 9-7. 



DMA Ch5 



y Reserved y 
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Figure 9-2. Subsections Where DMA Channel Registers Are Described 



Memory 
Address 


DMA Reqister 




Described 
in Section 


On Paae 


010 OOzOh 


DMA Ch. Control Register x 


T 


9.3.1 


9-7 


010 OOzlh 


Source Address x 




9.3.2 


9-16 


010 00z2h 


Source Address Index x 


I 


9.3.2 


9-16 


010 00z3h 


Transfer Counter x 


DMA 


9.3.3 


9-18 


010 00z4h 


Destination Address x 


Ch. 


9.3.2 


9-16 


010 00z5h 


Destination Address Index x 


X 


9.3.2 


9-16 


010 00z6h 


Link Pointer x 




9.3.4 


9-19 


010 00z7h 


Auxiliary Transfer Counter x 


1 


9.3.3 


9-18 


010 00z8h 


Auxiliary Link Pointer x 




9.3.4 


9-19 



x = channel number (e.g., all are 1 for channel 1 , all 2 for channel 2, etc.). 
z = corresponding hexadecimal digit for channel address (e.g., substitute 
"A" for DMA channel 0; "B" for DMA channel 1 , etc. See Figure 9-1 ). 

For example, if a block of data is to be transferred from one region in memory 
to another region in memory: 

1 ) The source address register of a DMA channel is loaded with the ad- 
dress of the source memory location. 

2) The destination address register of the same DMA channel is loaded 
with the address of the destination memory location. 

3) The transfer counter is loaded with the number of words to be trans- 
ferred. 

4) If sequential memory accesses are required, the source address in- 
dex register as well as the destination address index register would 
be set to 1 . 

5) The appropriate modes can be set up to synchronize the DMA 
coprocessor reads and writes to interrupts via the DMA channel control 
register. 

6) Then, the DMA coprocessor can be started via the DMA START field in 
the DMA channel control register. 

A DMA transfer consists of two steps: 

1) The source data value is read by the DMA channel and stored in a 
temporary register. 

2) The temporary register value is written to the destination address. 

During every data write, the transfer counter is decremented. The block 
transfer can be terminated when the transfer counter goes to zero and the 
write of the last transfer is complete. 

After a read by the DMA channel, the source-index register is added to the 
source-address register. After a write by the DMA channel, the destination- 
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index register is added to the destination-address register. (Both index 
registers contain signed values.) This allows for variable step sizes or con- 
tinual reads and/or writes from/to memory. In the case of an index register 
equaling zero, the DMA coprocessor transfers data from/to a fixed location. 

At the completion of a block transfer, the DMA coprocessor can be pro- 
grammed to do several things: 

□ most importantly, autoinitialize itself at the start of the next block trans- 
fer. Each DMA channel can read new control register values from mem- 
ory (as well as the other registers in Figure 9-2), load these values into 
its register block, and, according to the values loaded, begin another 
block transfer. This autoinitialization is done without any intervention 
by the CPU. 

□ generated an interrupt to signal that the block transfer is complete 

□ stop until reprogrammed 

A special split-mode allows the DMA channels to have the source and desti- 
nation paths split and bound to a communication port. In this mode, the 
DMA-channel source path (source-address register, source-index regis- 
ter, transfer-counter register, and link-pointer register) forms the primary 
split channel and is used to move data from a location in the processor's 
memory map to a communication port. The DMA-channel destination 
path (destination-address register, destination-index register, auxiliary 
transfer-counter register, and auxiliary link-pointer register) is the auxiliary 
split channel and is used to move data from the same communication port 
to a location in the processor's memory map. 

I ' ' : 1 

Note: DMA Coprocessor Programming Examples in Chapter 12 

Besides the descriptions of DMA coprocessor operation in this section, pro- 
gramming examples and explanations are provided in Chapter 12. 
i i i 
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9.3 DMA Coprocessor Registers 

The DMA coprocessor has nine registers designated as follows (for loca- 
tion, see Figure 9-2 on page 9-5): 

□ DMA channel control register (subsection 9.3.1 ) 

□ DMA-channel source-address register (subsection 9.3.2, page 9-1 6 ) 

□ DMA-channel source-address-index register (subsection 9.3.2, page 
9-16) 

□ DMA-channel destination-address register (subsection 9.3.2, page 
9-16 ) 

□ DMA-channel destination-address-index register (subsection 9.3.2, 
page 9-1 6 ) 

□ DMA-channel transfer-count register (subsection 9.3.3 on page 9-1 8) 

□ DMA-channel auxiliary-transfer-count register (subsection 9.3.3 on 
page 9-18) 

□ DMA-channel link-pointer register (subsection 9.3.4 on page 9-1 9) 

□ DMA-channel auxiliary-link-pointer register (subsection 9.3.4 on page 
9-19) 

Each DMA channel has one of each of these registers, discussed in the fol- 
lowing paragraphs. 

9.3.1 DMA Channel Control Register 

The format of the DMA channel control register is shown in Figure 9-3. 
Table 9-1 defines the register bits, the bit names, and the bit functions. 

At reset, the DMA channel control register is set to zero. This makes the 
DMA channel lower priority than the CPU, sets up the source address and 
destination address to be calculated via linear addressing, and configures 
the DMA channel in the unified mode. 

When an external interrupt is used for DMA coprocessor transfer synchroni- 
zation, the CPU is responsible for configuring external interrupts as edge- 
or level-triggered interrupts (as set in the applicable FUNCx and TYPEx bits 
of the interrupt flag register (subsection 3.1 .10 on page 3-12)). 



DMA Coprocessor Registers — Channel Control Register 
Figure 9-3. DMA Channel Control Register 
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30 


29 28 


27 26 


25 24 


23 22 


21 


20 


19 


18 


XX 


PRIORITY 
MODE 


AUX 
STATUS 


STATUS 


AUX 
START 


START 


AUX TCINT 
FLAG 


TCINT 
FLAG 


AUXTCC 


TCC 




RW 


R 


R 


RW-A 


RW 


R 


R 


RWSA 


RWS 






17 


16 


15 14 13 12 


11 


10 



COM PORTS 


SPLIT 
MODE 


WRITE BIT 
REV 


READ BIT 
REV 


AUXAUTOINIl 
SYNCH 


AUTOINIT 
SYNCH 


RW-A RW-A RW-A RWSA RWSA RWS 
9 8 7 6 5 4 


RWSA RWS 
3 2 1 0 



MJX AUTQINrr 
STATiC 


AUTOINIT 
STATIC 


SYNCH MODE 


AUX TRANSFER 
MODE 


TRANSFER 
MODE 


DMA PRi 



RWSA 



RWS 



RWSA RWS RWSA 



RWSA RWS RWS RWS RWS 



R - Bit may be read. 
W - Bit may be written. 

S - Bit is shadowed during autoinitialization (no changes take 

affect until autoinitialization is complete.) 
A - Bit is auxiliary for autoinitialization. 
xx - Reserved. 



Table 9-1. DMA Channel Control Register Bit Definitions 



Bit 
Nos. 


Mnemonic 


Read/ 
Write 


Description 


0-1 


DMA PRI 


R/W 


DMA coprocessor priority. Defines the arbitration rules 
to be used when a DMA channel and the CPU are re- 
questing the same resource. Affects all DMA coproces- 
sor modes. Rules listed in Table 9-2, page 9-14. 


2-3 


TRANSFER MODE 


R/W 


Defines the transfer mode used by the DMA channel. Af- 
fects unified mode and the primary channel in split mode. 
Bits defined in Table 9-3 on page 9-14. 


4-5 


AUX TRANSFER 
MODE 


R/W 


Defines the transfer mode used by the DMA channel. Af- 
fects the auxiliary channel in split mode only. Bits defined 
in Table 9-3 on page 9-14. 



Table continued on next page 
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Table 9-1. DMA Channel Control Register Bit Definitions (Continued) 



Bit 
Nos. 


Mnemonic 


Read/ 
Write 


Description 


6-7 


SYNC MODE 


R/W 


Determines the mode of synchronization to be used 
when performing data transfers, as shown in Table 9-4 
on page 9-15. 

Note: If a DMA channel is interrupt driven for both reads 
and writes, and the interrupt for the write comes before 
the interrupt for the read, the interrupt for the write is 
latched by the DMA channel. After the read is complete, 
the write can be executed. 


8 


AUTOINIT STATIC 


R/W 


If bit = 0, the link pointer is incremented during 

autoinitialization. 
If bit = 1 , the link pointer is not incremented (it is static) 

during autoinitialization. 
This affects unified mode and primary channel in split 
mode. It is useful to keep the auxiliary link pointer con- 
stant when autoinitializing from the on-chip communica- 
tion ports or other stream-oriented devices (such as f irst- 
in first-out (FIFO) memory buffers). 


9 


AUX AUTOINIT 
STATIC 


R/W 


Acts the same as for the AUTOINIT STATIC mode 
above, except that this affects the auxiliary channel in 
split mode only. 


10 


AUTOINIT SYNC 


R/W 


Has an effect only in the DMA coprocessor sync mode 
(bits 6-7 above). Affects the interrupt that is enabled by 
the DMA interrupt enable register (see Figure 3-4, page 
3-8) used for DMA reads: 
If bit = 0, the interrupt is ignored, and the 

autoinitialization reads are not synchronized 

with any interrupt signals. 
If bit = 1 , then the interrupt is recognized and is 

also used to synchronize the autoinitialization 

reads. 

This affects the unified mode and the primary channel in 
split mode (see bit 1 4, SPLIT MODE). The effect of this 
bit and the SYNC MODE bit in autoinitialization is sum- 
marized in Table 9-9 on page 9-37. 


11 


AUX AUTOINIT 
SYNC 


R/W 


Acts the same as the AUTOINIT SYNC bit above except 
that it affects DMA-coprocessor write autoinitialization 
sync in unified mode and the auxiliary channel in split 
mode. The effect of this bit and the SYNC MODE bits in 
autoinitialization is summarized in Table 9-9 on page 
9-37. 



Table continued on next page 
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Table 9-1. DMA Channel Control Register Bit Definitions (Continued) 



Bit 
Nos. 


Mnemonic 


Read/ 
Write 


Description 


12 


READ BIT REV 


R/W 


If bit = 0, the source address is modified using 

32-bit linear addressing. 
If bit = 1 , the source address is modified using 24-bit 

bit-reversed addressing. 
Affects unified mode and primary channel in split mode. 


13 


WRITE BIT REV 


R/W 


If bit = 0, the destination address is modified using 

32-bit linear addressing. 
If bit=1 , the destination address is modified using 

24-bit bit-reversed addressing. 
Affects unified mode and auxiliary channel in split mode. 


14 


SPLIT MODE 


R/W 


This controls the DMA coprocessor mode of operation. 

If bit = 0, DMA transfers are memory to memory. This is 
referred to as unified mode. 

If bit = 1 , split mode is entered with the DMA split into 
two channels, allowing a single DMA channel to 
perform memory-to-communication-port and co 
mmunication-port-to-memory transfers. 

The split mode may be modified by autoinitialization in 

unified mode or by autoinitialization by the auxiliary 

channel in split mode. Split mode is further described in 

Section 9.4. 


15-17 


COM PORT 


R/W 


These bits define a communication port (000 2 to 101 2) 
to be used for DMA transfers. 

If SPLIT MODE = 0, then COM PORT has no affect on 

the operation of the DMA channel. 
If SPLIT MODE = 1 , then COM PORT defines which of 

the six communication ports to use with the 

DMA channel. 
The COM PORT may be modified by autoinitialization in 
unified mode or by autoinitialization by the auxiliary 
channel in split mode. 



Table continued on next page 
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Table 9-1. DMA Channel Control Register Bit Definitions (Continued) 



Bit 
Nos. 


Mnemonic 


Read/ 
Write. 


Description 


18 


TCC 


R/W 


Transfer counter interrupt control. 

If TCC = 1 , a DMA channel interrupt pulse is sent to the 
CPU after the transfer counter makes a transition 
to zero and the write of the last transfer is complete. 
If enabled, the corresponding DMA interrupt (DMA 
INTO — INT5) occurs as shown in Figure 3-8, 
p. 3-16. 

If TCC = 0, a DMA channel interrupt pulse is not 

sent to the CPU when the transfer counter makes 
a transition to zero. 

Affects unified mode and the primary channel in split 

mode. DMA channel interrupts to the CPU are edge 

triggered. 


19 


AUXTCC 


R/W 


Auxiliary transfer counter interrupt control. 

If bit = 1 , a DMA channel interrupt pulse is sent to the 
CPU after the auxiliary transfer counter makes a 
transition to zero and the write of the last transfer 
is complete. If enabled, the corresponding DMA 
interrupt (DMA INT0-INT5) occurs as shown in 
Figure 3-8, p. 3-16. 

If bit = 0, a DMA channel interrupt pulse is not 

sent to the CPU when the auxiliary transfer counter 
makes a transition to zero. 

Affects the auxiliary channel in split mode only. 


20 


TCI NT FLAG 


R 


Transfer counter interrupt flag. 
This flag is set to 1 whenever the transfer counter makes 
a transition to zero and the write of the last transfer is 
completed. Whenever the DMA channel control register 
is read, this flag is cleared unless the flag is being set by 
the DMA in the same cycle as the read (in such case, 
TCINT is not cleared). 

The TCINT FLAG is affected by the unified mode and the 
primary channel in split mode. 



Table continued on next page 
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Table 9- 1. DMA Channel Control Register Bit Definitions (Continued) 



Bit Nos. 


Mnemonic 


Read/ 
Write 


Description 


21 


AUX TCI NT 
FLAG 


R 


Auxiliary transfer counter interrupt flag. 
This flag is set to 1 whenever the auxiliary transfer 
counter makes a transition to zero and the write of the 
last transfer is completed. Whenever the DMA control 
register is read, this flag is cleared unless the flag is be- 
ino set bv the DMA coorocessor in the same cvcle as the 
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read (in such case AUX TCINT is not cleared). The AUX 
TCINT FLAG is affected by the auxiliary channel in split 
mode. 

Since only one DMA-channel interrupt is available for a 
DMA channel, you can determine what event had set the 
interrupt by examining the TCINT FLAG and the AUX 
TCINT FLAG. 


22— 23 


OTA 1 \ 1 

START 


R/W 


Starts and stops the DMA channel in several different 
ways (listed in Table 9-5, page 9-15). START affects 
the unified mode and the primary channel in split mode. 
If used to hold a channel in the middle of an autoinit se- 
quence, the START and AUX START bits will hold the 
autoinit sequence. 

If the START or AUX START bits are being modified by 
the DMA channel (for example, to force a halt code of 
102on a transfer-counter terminated block transfer) and 
a write is being performed by an external source to the 
uiviA cnannei coniroi register, internal moaiTicaxion oi 
the START or AUX START bits by the DMA channel 
has priority. See TRANSFER MODE bits value of 0 1 2 , 
(Table 9-3,). 


24-25 


AUX START 


R/W 


Starts and stops the DMA channel in several different 
ways (listed in Table 9-5, page 9-15) AUX START af- 
fects the auxiliary channel in split mode only. 


26-27 


STATUS 


R 


Indicates the status of the DMA channel as listed in 
Table 9-6, page 9-16. STATUS is updated in the unified 
mode and by the primary channel in the split mode. Up- 
dates are done every cycle. 

The STATUS and AUX STATUS bits (Table 9-6) are 
used to determine the current status of the DMA chan- 
nels and to determine if the DMA channel has halted or 
has been reset after writing to the START or AUX START 
bits. 



Table concluded on next page 
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Table 9-1. DMA Channel Control Register Bit Definitions (Concluded) 



Bit Nos. 


Mnemonic 


Read/ 
Write 


Description 


28- 29 


AUX STATUS 


R 


Indicates the status of the DMA channel as listed in 
Table 9-6, page 9-16. STATUS is updated by the auxil- 
iary channel in split mode only. Updates are done every 
cycle. 


30 


PRIORITY 
MODE 


R/W 


Priority mode of DMA channel access: 

0 = Rotating priority as shown in Section 9.5 (on 

page 9-22). 

1 = Fixed priority as shown in Section 9.5. 
This bit is available only at DMA channel 0 (zero). 


31 






Reserved. 
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Table 9-2. DMA PRI Bits and CPU/DMA Arbitration Rules 



DMA PRI 
Bit Nos: 
1 -0 


Effect 


0 0 


DMA coprocessor access is tower priority than the CPU access. If the 
DMA channel and the CPU are requesting the same resource, then the 
Orii win proceed, bus are sex inis way ax resex. 


0 1 


it xne uma cnannei ana xne oru are requesxing xne same resource, 
then the CPU will proceed. Then, after the CPU access is complete, if 
the DMA coprocessor and CPU are again requesting the same re- 
source, the DMA coprocessor will proceed. This priority rule provides a 
fair arbitration scheme by alternating CPU accesses with a DMA chan- 
nel's access. 


1 0 


Reserved. 


1 1 


DMA coprocessor access is higher priority than the CPU access. If the 
DMA channel and the CPU are requesting the same resource, then the 
DMA will proceed. 



Table 9-3. TRANSFER MODE andAUX TRANSFER MODE Field Description 



TRANSFER MODE 
Bit Nos: 
3-2 
5-4 


Effect 


0 0 


Transfers are not terminated by the transfer counter, and no 
autoinitialization is performed. TCINT (transfer counter interrupt) can still 
be used to cause an interrupt when the transfer counter makes a transi- 
tion to zero. The DMA channel continues to run. Note that the address 
continues incrementing while the transfer count rolls over to its maxi- 
mum value of OFFFF FFFFh. 


0 1 


Transfers are terminated by the transfer counter. No autoinitialization is 
performed. A halt code of "IO2 is placed in the START field when trans- 
fers are completed. 


1 0 


Autoinitialization is performed when the transfer counter goes to zero 
without waiting for CPU intervention. 


1 1 


The DMA channel is autoinitialized when the CPU restarts the DMA 
coprocessor by using the DMA register in the CPU. When the transfer 
counter goes to zero, operation is halted until the CPU starts the DMA 
coprocessor by using the START field in the DMA channel control 
register (bits 22-23 and 24-25, Table 9-5). A halt code of 10 2 is placed 
in the START field by the DMA coprocessor. 
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Table 9-4. SYNCH MODE Field Description 



SYNCH MODE 
Bit Nos: 

"7 fi 
1 — O 


Effect 


0 0 


No synchronization. Interrupts are ignored. 


0 1 


Source synchronization. A read will not be performed until an enabled 
interrupt occurs. 


1 0 


Destination syncnromzation. a whig win not ue penormea until an enauiea 
interrupt occurs. 


1 1 


Source and destination synchronization. A read is performed when an en- 
abled interrupt occurs. Then, a write is performed when an enabled inter- 
rupt occurs. The interrupts used are specified by the DMA READ and DMA 
WRITE fields of the DMA interrupt enable (DIE) register (subsection 3.1 .8 
on page 3-8). 


Table 9-5. START and AUX START Field Description 


START 
BIT imos: 
23-22 
25-24 


Effect 


0 0 


DMA channel reset. DMA channel read or write cycles in progress are com- 
pleted (not aborted); any data read is ignored. Any pending (not started) 
read or write is canceled. The DMA channel is reset so that when it starts, 
a new transaction begins; that is, a read is performed. In this start mode, 
stopping is immediate with no other registers loaded. 


0 1 


Halts the DMA channel on the first available read or write boundary. If the 
read or write has begun, the read or write is completed before stopping (i.e., 
in the middle or at the end of a DMA channel transfer). If a read or write has 
not begun, no read or write is started. In this start mode, stopping is immedi- 
ate with no other registers loaded). 


1 0 


Halts the DMA channel on the first available transfer boundary. If a DMA 
transfer has begun, the entire transfer is completed, including both cycles 
(both read and write operations), before stopping. If a transfer has not be- 
gun, none is started. In this start mode, stopping is immediate with no other 
registers loaded. 


1 1 


DMA start. Writing 11 2 tothisfield starts the DMA process using the values 
in the channel's DMA channel registers (Figure 9-1 ). If the DMA is in auto- 
initialization, all DMA registers are loaded before starting the operation. 
The DMA coprocessor starts from reset if previously reset (START bits = 
OO2) or restarts from the previous state if previously halted (START bits = 
01 2 or10 2 ). 
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Table 9-6. STATUS andAUX STATUS Field Description 



STATUS 
Bit Nos: 
27-26 
29-28 


Meaning 


U U 


DMA process is in the middle of the DMA transfer (between the write and 
read operations). This is the value at RESET, after a halt on a transfer 
boundary, or after a block transfer. 


0 1 


DMA process is being held (for any reason) in the middle of a DMA transfer; 
that is, in the middle of the read/write operation. 


1 0 


Reserved. 


1 1 


DMA channel is not being held or reset. 



9.3.2 DMA Channel Address and Index Registers 

As shown in Figure 9-4, both the DMA coprocessor source-address and 
destination-address registers have an associated index register. After each 
DMA channel read (source address) or write (destination address), the 
corresponding (source or destination) address generator adds the index 
register to the address register and places the result in the address register. 
In this way, the address register acts as accumulator because it retains the 
sum of itself and its index register. 

Address Register + Index Register — > Address Register 

The values in these registers are undefined at reset. 

Depending upon bits 1 2 and 1 3 (READ BIT REV and WRITE BIT REV) of 
the DMA channel control register, the addition may be either: 

□ linear (normal addition): READ BIT REV = 0 or WRITE BIT REV = 0, 
or 



9 



□ bit reversed (reverse carry propagation): READ BIT REV = 1 or 
WRITE BIT REV=1. 

Both index values (source or destination) are signed values. 
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DMA Coprocessor Registers — Channel Address and Index Registers 
Figure 9-4. DMA-Coprocessor Address Generation 
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9.3.3 DMA Channel Transfer-Counter and Auxiliary-Transfer-Count 
Registers 

These registers contain the number of words to be transmitted. 

Figure 9-5 shows the six transfer counters and the six auxiliary transfer 
counters. Each auxiliary transfer counter is used when the DMA channel is 
in split mode (described in Section 9.4 on page 9-20). The values in these 
registers are set to zero at reset. 

The counters are decremented after completing the address fetch for the 
write portion of a transfer. The TCI NT FLAG and AUX TCI NT FLAG (bits 20 
and 21 of the DMA channel control register, Figure 9-3 on page 9-8) are 
not set until the counter is decremented and the write of the last transfer is 
completed. Correspondingly, the interrupt will not be seen by the CPU inter- 
rupt controller until the transfer counter is decremented and the write of the 
last transfer is completed. 

The decrementer checks for equality with zero after the decrement is per- 
formed. As a result, if the count register has a value of 1 , then the DMA chan- 
nel can be halted after only one transfer is performed. The count is treated 
as an unsigned integer. Transfers may be halted when a zero count is de- 
tected after a decrement If the DMA coprocessor channel is not halted after 
the transfer reaches zero, the counter will continue decrementing below 
zero. 

Figure 9-5. DMA Coprocessor Transfer-Count Registers 
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9.3.4 DMA-Channel Link-Pointer and Auxiliary-Link-Pointer Registers 

The link pointers specify the address from which to load the new DMA chan- 
nel register values when autoinitialization is performed. When a channel 
has exhausted its counter, it will (if appropriately configured) use the link 
pointer to reload itself. Figure 9-6 illustrates the DMA coprocessor link ad- 
dress registers. The values in these registers are undefined at reset. 

For example, under autoinitialization, the steps to load the channel registers 
for DMA channel 0 (as shown in Figure 9-1 on page 9-4) would be: 

1 ) Get link pointer for next DMA operation. Pointer is memory address con- 
taining contents of first DMA channel 0 register (channel control regis- 
ter as shown in Figure 9-1 on page 9-4). 

2) Bring in contents pointed to by pointer and write to address 01 0 OOAOh 
(first word of DMA channel 0 registers as shown in Figure 9-1). 

3) Increment link pointer. (Skip this step if AUTOINIT STATIC bit = 1 .) 

4) Bring in next word and write to address 01 0 00A1 h. 

5) Repeat until entire block of registers is loaded for DMA channel 0. 

Figure 9-6. DMA Coprocessor Link Pointer Registers 
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9.4 DMA Channels in Unified and Split Modes 

Unified and split mode are depicted in separate diagrams (Figure 9-7 and 
Figure 9-8 on the next page). The split mode transforms one DMA channel 
into two DMA channels: 

□ Primary Channel: one dedicated to reading data from a location in the 
memory map (external/internal) and writing it to a communication port 

□ Auxiliary Channel: one dedicated to receiving data from a communi- 
cation port and writing it to a location in the memory map 

To accommodate the six communication ports, all six DMA channels can 
support this split mode (DMA channels 0-5). 

The SPLIT MODE bit (bit 14 of the DMA channel control register, 
Figure 9-3) controls the DMA unified or split mode: 

□ For unified mode (Figure 9-7): Set SPLIT MODE bit to 0 (zero) 

□ For split mode (Figure 9-8): Set SPLIT MODE bit to 1 

The COM PORT field of the DMA channel control register (bits 15-17 as 
shown in Figure 9-3) defines which communication port is used (port 0-5). 
Figure 9-8 shows typical operations using one communication port. 

□ The transfer counter register controls the primary channel transfers. 

□ The auxiliary transfer counter register controls the auxiliary channel 
transfers (both these registers shown in Figure 9-1, page 9-4). 

DMA channel arbitration in split mode is described in subsection 9.5.3 on 
page 9-24. 
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Figure 9-7. Typical Unified Mode DMA Channel Configuration 
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Figure 9-8. Typical Split-Mode DMA Configuration 
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9.5 DMA Coprocessor Internal Priority Schemes 

Within the DMA coprocessor, two priority schemes are used to designate 
which channel is serviced next: 

□ a fixed priority scheme with channel 0 always having the highest priority 
and channel 5 the lowest, 

□ a rotating priority scheme which places the just-serviced channel at the 
bottom of the priority list. 

Select the desired scheme by setting bit 30 (PRIORITY MODE) of DMA 
channel O's DMA channel control register (Figure 9-3 and Table 9-1 on 
page 9-8): 

□ PRIORITY MODE = 0 = rotating priority 

□ PRIORITY MODE = 1 - fixed priority 

9.5.1 Fixed Priority Scheme 

This scheme provides a fixed priority (unchanging) for each channel as fol- 
lows: 

Highest priority 0 
1 
2 
3 
4 

Lowest priority 5 

To set up this scheme, set the PRIORITY MODE bit (bit 30) of channel O's 
DMA channel control register to 1 (one). 

9.5.2 Rotating Priority Scheme 

In a rotating priority scheme, the last channel serviced becomes the lowest 
priority channel. The other channels sequentially rotate through the priority 
list with the next lowest channel from the just-serviced channel becoming 
the highest priority on the following request. The priority rotates every time 
the most-recent priority-granted channel completes its access. Figure 9-9 
and Figure 9-11 illustrate the rotation of priority across several DMA co- 
processor accesses. At system reset, the channels are ordered from high- 
est to lowest priority (0, 1 , 2, 3, 4, 5). 

To set up this scheme, set the PRIORITY MODE bit (bit 30) of channel O's 
DMA control register to 0 (zero). 

The DMA coprocessor handles channel arbitration on an access-by-access 
basis; that is, a DMA channel must contend for both the read and the write 
access in unified mode. 
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Figure 9-9. Rotating Priority Mode Example of the DMA Coprocessor 
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t DMA channel requesting an access 

Each service is one read access or write access. See 
Figure 9-10 for an example of a read/write sequence. 



At the start of the example in Figure 9-9, channels 2, 4, and 5 are requesting 
service. Since channel 2 has the highest priority, it is serviced first. It then 
becomes the lowest priority channel. The highest priority channel then be- 
comes channel three. On the following services, channels 4 and 5 are taken 
care of in a similar fashion. Each service means one read access or one 
write access. Figure 9-1 0 shows the entire read and write sequence. 



Figure 9-10. DMA Read and Write Sequence Example 
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Another way to visualize the rotation of priorities is shown in Figure 9-11 . 
This example shows the same results as in Figure 9-9. It helps to make 
clearthe rotating nature of the priority scheme. Priority decreases from high- 
est to lowest in a clockwise direction. The priority rotates in a counter clock- 
wise direction with the most recently serviced channel ending up in the low- 
est priority position. 
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Figure 9-11. Example of a Priority Wheel 




With the rotating priority scheme, any DMA channel requesting service is 
guaranteed to be recognized after a number of higher priority requests have 
been serviced. The maximum number of requests are: 

□ five in unified mode 

□ eleven in split mode 

This provides a fair means of preventing a channel from monopolizing the 
system. 

DMA channels that are running and are not synchronized via interrupts are 
always requesting service. 

9.5.3 Split Mode and DMA Channel Arbitration 

When a DMA channel is running in split mode, arbitration between channels 
is similar to that just discussed. The split-mode DMA channel has the same 
priority as the unified DMA channel. The only issue is how to arbitrate be- 
tween the primary split channel and the auxiliary split channel. Both split 
channels alternate priorities via a rotating priority scheme between each 
other. 

When a DMA channel is in split mode and both paths are simultaneously 
reset via the START and AUX START bits, the output (primary) channel has 
priority over the input (auxiliary) channel. Both the START and AUX START 
bits must be written at the same time in order to achieve this reset condition. 
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The priority scheme for channels is slightly different than the scheme for 
primary and auxiliary channels within a channel: 

□ for channels, priority changes after a read or a write 

□ for the primary and auxiliary channels within a channel, priority changes 
after a complete read and write. 

Figure 9-12 is an example of two channels contending for the DMA bus: 
channel 2 (a split channel) and channel 4. In this case: 

□ only channel 2 (i.e., not channel 4) is being run in split mode 

□ its primary channel is identified as 2pri 

□ its auxiliary channel is identified as 2aux 

□ the paths requesting service are identified with a t 

In the example described below, channel 4 will do one complete transfer 
(read and write) for each complete transfer of either channel 2pri or 2aux. 

Figure 3- 12. Example of a Channel Priority Scheme in Split Mode 
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t DMA channel requesting an access 

* Split channels requesting access 

2pri = the primary split channel of channel 2 

2aux = the auxiliary split channel of channel 2 

The channel priority scheme in Figure 9-12 is further shown sequentially 
in Figure 9-13 (on the next page): 

1 ) The first service is a request by the primary split channel of channel 2 
(2pri). 2pri reads, and then channel 2 is moved to the lowest priority 
level, but 2pri remains the higher priority channel of channel 2. 

2) On the second service, channel 4, now a higher priority than channel 
2, reads its source address. 

3) On the third service, the value read by 2pri is written to its destination 
address, and channel 2 is moved to the lowest priority level. Also, 2pri 
is moved to a lower priority than 2aux, channel 2's auxiliary channel. 
Note that the split channel that just completed a read retains a higher 
priority than the other split channel until the data is written to the destina- 
tion address. 
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4) On the fourth service, the value read by channel 4 in service 2 is now 
written to its destination address and the channel becomes the lowest 
priority. 

Figure 9-13. Service Sequence for Split Mode Priority Example 
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5) In the fifth service, 2aux is read and channel 2 becomes the lowest 
priority. 

6) On the sixth service, channel 4 is read again, and it becomes the lowest 
priority. 

7) On the seventh and eighth services, the 2aux and channel 4 values 
that were read in services 5 and 6 are now written to their destination 
addresses. After the channel is written, it assumes the lowest priority. 

8) In the ninth service, 2pri is read again as in the first service, and the 
read/write cycle continues as begun in the first service. 
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9.6 CPU and DMA Coprocessor Arbitration 

The DMA coprocessor has its own internal buses for transferring data. Only 
when a resource conflict exists between the DMA coprocessor and the CPU 
is it necessary for arbitration between these two. 

When the CPU and DMA coprocessor arbitrate for memory access, the 
memory address along with the channel's DMA PRI bits (bits 0 and 1 of the 
channel control register) are used in this arbitration scheme. These bits are 
described in Table 9-7 below. Higher priority DMA channels will be serviced 
before lower priority DMA channels whose requested address does not con- 
flict with a CPU access or who have higher priority than the CPU. 

The DMA PRI bits of the channel control register (of the DMA channel arbi- 
trating with the CPU) define the arbitration rules. These rules apply when- 
ever the CPU and the highest priority requesting channel request the same 
resource. Otherwise, the CPU and DMA coprocessor access may proceed 
in parallel. 

All arbitration between the CPU and the DMA coprocessor is on an access 
basis; that is, the DMA coprocessor must contend for the read and the write 
accesses of a DMA transfer in unified mode and split mode. 

Table 9-7. DMA PRI Bits and CPU/DMA Arbitration Rules 



DMA PRI 

(Bits 1-0) 


Effect 


0 0 2 


DMA access is lower priority than the CPU access. If the DMA channel 
and the CPU are requesting the same resource, then the CPU will pro- 
ceed. (DMA PRI bits are set to 00 2 at reset.) 


0 1 2 


If the DMA channel and the CPU are requesting the same resource, 
then the CPU will proceed. Then, after the CPU access is complete, 
if the DMA coprocessor and CPU are again requesting the same re- 
source, the DMA coprocessor will proceed. This priority rule provides 
a fair arbitration scheme by alternating CPU accesses with a DMA 
channel's access. 


1 0 2 


Reserved 


1 1 


DMA access is higher priority than the CPU access. If the DMA chan- 
nel and the CPU are requesting the same resource, the DMA will pro- 
ceed. 
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9.7 Data Transfer Modes 

Each DMA channel can operate in four types of data transfer modes. These 
modes differ on: 

□ whether or not they use autoinitiaiization 

□ how they operate if autoinitiaiization is in effect or not. 

Table 9-8 and the following paragraphs describe these data transfers. 



Table 9-8. TRANSFER MODE Field Description Summary 



TRANSFER 
MODE 
(Bits 3-2 
and 5-4) 


Transfer Mode Summary 


Sub- 
section 




Transfers are not terminated by the transfer counter. No 
autoinitiaiization is performed. The TCINT (transfer count inter- 
rupi; diis can sxiii u© useu to cause an inierrupi wnen ine Transier 
counter makes a transition to zero. The DMA channel continues to 
run. 


Q 7 1 


01 2 


Transfers are terminated by the transfer counter.No autoinitiaiiza- 
tion is performed. A half code of 102 is placed in the START field 
(bits 22-23 and bits 24-25 of the DMA channel control register 
when transfers are complete). 




1 0 2 


Autoinitiaiization is performed when the transfer counter goes to 
zero without waiting for CPU intervention. 


9.7.3 


1 1 2 


The DMA channel is autoinitialized when the CPU restarts the 
DMA coprocessor by using the DMA channel control register in 
the CPU. When the transfer counter goes to zero, operation is 
halted until the CPU starts the DMA coprocessor by using the 
START field in the DMA channel control register. A halt code of 
10 2 is placed in the START field by the DMA. 


9.7.4 



9.7.1 Running Under TRANSFER MODE = 00 2 

When TRANSFER MODE = OO2, transfers are not terminated when the 
transfer counter goes to zero, and no autoinitiaiization is performed. Even 
though the transfer counter does not halt transfers, an interrupt can be gen- 
erated on the transfer counter transition to zero, causing TCINT FLAG bit 
= 1 . If the DMA coprocessor channel is not halted after the transfer reaches 
zero, the counter will continue decrementing below zero. 

9.7.2 Running Under TRANSFER MODE = 01 2 

When TRANSFER MODE = 01 2, transfers are terminated when the transfer 
counter goes to zero, and no autoinitiaiization is performed. When the 
transfer counter goes to zero, the DMA channel is halted by forcing 1 02 into 
the START or AUX START field. 
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9.7.3 Running Under TRANSFER MODE = 1 0 2 

This transfer mode allows the DMA channel to take care of itself. It can run 
continuously, change pointers and synchronization by the autoinitialization 
procedure, and turn itself off. 

This mode always starts with the DMA channel reset (START or AUX 
START a OO2) or halted (these bits 01 2 or 1 02) and the transfer counter at 
0. This occurs after a system reset, after the DMA channel is software reset 
(00 2 written to the START or AUX START bits), or after the channel is halted 
(01 2 or IO2 written to these bits). The process for setting up and running 
a DMA channel under transfer mode 1 02 is summarized in Figure 9-1 4. 

1 ) After placing the DMA channel in the reset or halted state and the trans- 
fer counter at 0, initialize the channel for the desired operation. In this 
case, set the transfer mode bits to IO2. Since the DMA channel autoin- 
itializes itself when started under this mode, the CPU needs only to ini- 
tialize the DMA channel control register and the DMA channel link point- 
er. The other DMA channel registers are automatically set up by the au- 
toinitialization process. Synchronization of reads and writes is allowed. 

2) After initializing the DMA channel, the channel can be started by writing 
1 1 2 to the START or AUX START bits. 

3) After this, the DMA channel will perform the sequence: autoinitialize 
and do a block transfer. 

Figure 9-14. Running a DMA Channel Under Transfer Mode IO2 
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9.7.4 Running Under TRANSFER MODE = 11 2 

This transfer mode, besides having all of the advantages of autoinitializa- 
tion, allows the CPU to very easily coordinate its operation with the opera- 
tion of the DMA channels. 

Like transfer mode (see subsection 9.7.3), this mode always starts out with 
the DMA channel reset or halted and the transfer count = 0. This occurs after 
a system reset, after the DMA channel is reset by writing OO2 to the START 
or AUX START bits in the DMA channel control register, or after the channel 
is halted by writing 01 2 or 1 02 to these bits. The process for setting up and 
running a DMA channel under transfer mode H2 is summarized in 
Figure 9-15. 

1 ) After placing the DMA channel in the halted or reset state and the trans- 
fer counter = 0, initialize the channel for the desired mode of operation. 
In this case, set the TRANSFER MODE bits to 11 2 . Since the DMA 
channel autoinitializes itself when started under this mode, the CPU 
needs to initialize only the DMA channel control register and the DMA 
channel link pointers. The other DMA channel registers are set up by the 
autoinitialization procedure. 

2) After initializing the DMA channel, the channel can be started by writing 
11 2 to the START bits. 

3) Then, the DMA channel autoinitializes itself and does a block transfer. 

4) When the transfer counter goes to zero, wait for the CPU to write a 1 1 2 
to the START field of the DMA channel control register. 

5) Then repeat the sequence autoinitialize, transfer, and wait 

6) When the transfer count goes to zero, the DMA channel can be halted 
by forcing 1 0 2 into the START or AUX START field. 

Figure 9-15. Running a DMA Channel Under Transfer Mode 1 12 
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9.8 Autoinitialization 

When the DMA channel is operating in autoinitialization mode, the link point- 
er register and auxiliary link pointer register are used to initialize the regis- 
ters that control the operation of the DMA channel. These pointers are me- 
mory-address locations for blocks of data that are to be loaded into the DMA 
register file, as shown in Figure 9-1 and Figure 9-2, beginning on page 
9-4. 

How this autoinitialization is done depends upon the current mode of opera- 
tion of the DMA channel and the mode to which it is being autoinitialized. 
In all cases, either the link pointer or auxiliary link pointer (used in DMA split 
channel mode) contains the address of a block of memory that contains the 
new DMA channel register values (registers shown in Figure 9-1 on page 
9-4). 

During autoinitialization, the link pointer may be incremented (AUTO INIT 
STATIC = 0) or held constant (AUTO INIT STATIC = 1 ).. (This is bit 8 or 9 
of the channel control register, Figure 9-3 on page 9-8.) 

□ When the link pointer is incremented, the autoinitialization values are 
stored in sequential memory locations, and the link pointer or auxiliary 
link pointer is incremented in order to access each of these locations. 

□ Holding the linking pointer constant is very useful when autoinitializing 
the DMA channel from a stream-oriented device such as the on-chip 
communication ports or external FIFOs. 

The SPLIT MODE bit (bit 1 4 in Figure 9-3 on page 9-8) defines the mode 
under which the DMA channel is currently running. When autoinitializing the 
DMA coprocessor, do not change the SPLIT MODE bit. This bit should be 
changed only when the DMA coprocessor has been reset and halted (see 
DMA START bit description, Table 9-5 on page 9-15). 

Autoinitialization is a DMA operation to the DMA coprocessor's registers; 
i.e., it reads the value pointed to by the link pointer and writes the value to 
the DMA register over the peripheral bus on the next available cycle. 
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If the DMA channel is performing memory-to-memory transfers 
(SPLIT MODE = 0), the link pointer is used. The DMA channel registers are 
loaded in the following order: 

1 ) DMA channel control register 

2) Source address register 

3) Source address index register 

4) Transfer count register 

5) Destination address register 

6) Destination address index register 

7) Link pointer register 

The storage of new values for these registers in memory is illustrated in 
Figure 9-1 6. 

Figure 9-16. Store New Values of DMA Channel Registers in Memory (SPLIT MODE = 0) 
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If the DMA channel is running in split mode (SPLIT MODE = 1 ), then the 
autoinitialize sequence depends upon which counter has terminated. 

If the transfer-count register has gone to zero with SPLIT MODE=1 , 

then the link-pointer register is used for autoinitialization. In this case, the 
DMA channel registers are loaded in the following order: 

1 ) DMA channel control register 

2) Source address register 

3) Source address index register 

4) Transfer count register 

5) Link pointer register 

The storage of the new values for these registers in memory is illustrated 
in Figure 9-17. 
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Figure 9-17. Store New Values of DMA Channel Registers in Memory (SPLIT MODE - 1) 

Map of New Register Values in Memory 
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If the auxiliary counter has gone to zero with SPLIT MODE=1 , then the 
auxiliary link pointer register is used for autoinitialization. In this case, the 
DMA channel registers are loaded in the following order: 

1 ) DMA channel control register 

2) Destination address register 

3) Destination address index register 

4) Auxiliary transfer count register 

5) Auxiliary link pointer register 

The storage of the new values of these registers in memory is illustrated in 
Figure 9-18. 



Figure 9-18. Store New Values of DMA Channel Registers in Memory (SPLIT MODE = / and Auxiliary 
Transfer Counter = 0) 
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Usually, autoinitialization data will be stored in memory. In this case, syn- 
chronization for autoinitialization is not generally necessary. To disable the 
synchronization of data reads during autoinitialization, set AUTOINIT 
SYNCH (bits 1 0 and 1 1 , DMA channel control register) to 0. In some cases, 
you may wish to transfer autoinitialization data in the same way as the syn- 
chronized data reads and writes. To synchronize autoinitialization based 
upon the interrupt identified with the READ SYNCH and WRITE SYNC 
fields (DIE register, page 3-8), set both the AUTOINIT SYNCH and AUX 
AUTOINIT SYNCH (bits 10-1 1 of DMA channel control register) to 1 . In this 
way, autoinitialization will be synchronized only with the SYNCH MODE in 
effect. 

The data reads for autoinitialization are arbitrated for by the DMA channels 
just like a typical DMA access. The only difference is that their synchroniza- 
tion is controlled by AUTOINIT SYNCH. A summary of the autoinitialization 
effect of the SYNC MODE and AUTOINIT SYNC bits is listed in Table 9-9 
on page 9-37. This table pertains to autoinitialization only. 

In unified mode, all of the writable control register bits are affected by 
autoinitialization. These bits are labeled in Figure 9-19. 

In split mode during autoinitialization of the primary DMA channel, the 
writable, nonauxiliary bits may be modified, but auxiliary bits are protected 
(these bits are in Figure 9-20). In other words, only nonauxiliary bits are al- 
lowed to be modified by the CPU or DMA coprocessor. Also, if the auxiliary 
DMA channel is autoinitialized, the writable auxiliary bits may be modified, 
but nonauxiliary bits are protected. These bits are labeled in Figure 9-21. 

In all autoinitialization modes, the shadowed bits (Figure 9-19) that are 
writable (W-designated bits in Figure 9-3) do not have an affect until 
autoinitialization is complete. Unshadowed bits affect the autoinitialization 
sequence. In other words, at autoinitialization, shadowed bit values will be 
entered last after all registers are loaded (as specified by the link pointer). 

Regardless of whether the DMA channel is running in unified mode or split 
mode, writes by the CPU or another external source to the DMA channel 
control register affect all writable bits. 
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Figure 9-19. DMA Channel Control Register Bits That Can Be Modified by Autoinitialization Under 
Unified Mode 
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Figure 9-20. DMA Channel Control Register Bits That Can Be Modified by Autoinitialization of the 
Primary Channel Under Split Mode 
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s — These shadowed bits do not take affect until autoinitialization is complete, 
xx — Write protected during primary channel autoinitialization. 
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Figure 9-21. DMA Channel Control Register Bits That Can Be Modified by Autoinitialization of the 
Auxiliary Channel Under Split Mode 
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s — These shadowed bits do not take affect until autoinitialization is complete, 
xx — Write protected during primary channel autoinitialization. 
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Autoinitialization synchronization is a function of 

□ the SYNC MODE bits (DMA channel control register bits 6 and 7) that 
control synchronization of data transfers, and 

□ the AUTOINIT SYNC bits (DMA channel control register bits 1 0 and 1 1 ) 
that affect only autoinitialization synchronization. 

If the SYNC MODE bits are not set to synchronize data transfers (i.e., if the 
preceding data transfer is not synchronized on interrupts), then the DMA 
channel autoinitialization sequence will not be synchronized either. If the 
SYNCH MODE bits are set to transfer data synchronously (i.e., if the pre- 
ceding data transfer is synchronized), then the upcoming data channel au- 
toinitialization sequence may be synchronized on either reads or writes or 
both (depending on whether the DMA coprocessor is in unified or split 
mode) as shown in Table 9-9. Note that for all combinations of the SYNCH 
MODE and AUTOINIT SYNC bits not shaded in the table, the DMA channel 
autoinitialization sequence is not synchronized on interrupts. 

Table 9-9. Effect of SYNC MODE and AUTOINIT MODE bits in Autoinitialization 



These Bits of the DMA Channel 
Control Register 


Cause Autoinitialization 
Synchronization To Occur On 


SYNC MODE 
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no sync 


0 1 


0 0 


no sync 


no sync 


0 1 


0 1 


Read 




0 1 


1 0 


no sync 


no sync 


0 1 


1 1 


"""""""" ■■■■■■■■■■■■■■■■■■■■ 

Read 


Primary 


1 0 


0 0 


no sync 


no sync 


1 0 


0 1 


no sync 


no sync 


1 0 


1 0 


Write 


Auxiliary Ch 


1 0 


1 1 


Write 


: Auxiliary Ob 


1 1 


0 0 


no sync 


no sync 


1 1 


0 1 


Bmd 


Primary Ch 


1 1 


1 0 


Write 


Auxiliary Cfe 


lillllllll 

1111 .1., 1 .1 ,.MI,I..,,I,. , ,.,„, 




Read 
Write 


Primary Ch 
Auxiliary Ch 



I 



.5 



9-37 



Autoinitialization 



9.8.1 Fun With Link Pointers 

For many applications, it is sufficient to autoinitialize the DMA channel with 
the same data each time. In this case, the new link-pointer value points to 
the start of the same block of data containing the new link pointer as illus- 
trated in Figure 9-22. This particular example assumes a DMA channel that 
is not running in split mode. 

If you want, you can get fancier. The new link pointer may point to a new set 
of register values as illustrated in Figure 9-23. This may be continued to any 
level you like. Have fun! 

Figure 9-22. Self -Referential Link Pointer 
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Figure 9-23. Referring to a New Link Pointer 
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9.9 DMA Coprocessor and Interrupts 

All of the interrupts that the DMA coprocessor can see are first received by 
the CPU interrupt controller. If these interrupts are edge triggered, they are 
latched by the CPU in the appropriate interrupt-flag register. The edge-trig- 
gered interrupts are timer interrupts, DMA interrupts, and external inter- 
rupts that are configured as edge-triggered interrupts. Detailed information 
on interrupts is provided in Section 6.7 on page 6-23. 

For edge-triggered interrupts, when the interrupt controller determines 
that the interrupt a DMA channel is waiting on has been latched into the in- 
terrupt flag, the CPU clears the interrupt flag and sends an interrupt pulse 
to the DMA channel. The DMA channel latches the interrupt locally until it 
can service the interrupt. At that time, the latched interrupt is cleared by the 
DMA coprocessor for two cycles. 

Level-triggered interrupts that are generated by the communication ports 
and external interrupts that are configured as level-triggered interrupts are 
handled differently by the CPU interrupt controller. For level-triggered inter- 
rupts when the interrupt controller determines that the interrupt a DMA 
channel is waiting on has been received (recall that level-triggered inter- 
rupts are not latched by the CPU interrupt-controller), the CPU sends an in- 
terrupt pulse to the DMA channel. The DMA channel latches the interrupt 
locally until it can service the interrupt. At that time, the locally latched inter- 
rupt is cleared by the DMA coprocessor for two cycles. 

The interrupt reset signal generated by the DMA coprocessor after a DMA 
interrupt is serviced has a higher priority over the interrupt set signal. Thus, 
the interrupt signal won't be continuously set even if the CPU is continuously 
sending the interrupt set signal. Hence, when the DMA-set priority scheme 
is used and a higher priority DMA channel is driven by continuous interrupt 
signals, the lower priority DMA channel can be serviced in between the high- 
er priority DMA services. 

The internal circuitry of the TMS320C40 guarantees proper operation be- 
tween a communication port that generates level-triggered interrupts and 
the DMA channel that is synchronizing with those level-triggered interrupts. 
However, when you synchronize the DMA channels with external interrupts, 
it is better that these interrupt lines be configured as edge-triggered inter- 
rupts to ensure that only one interrupt is recognized. 

Subsection 9.9.1 describes using interrupts to synchronize the DMA 
coprocessor. The interrupt mode for each channel is first selected in the 
DMA interrupt enable register, described with the CPU registers in subsec- 
tion 3.1.8 on page 3-8. 
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9.9.1 Interrupts and Synchronization of DMA Channels 

DMA channel transfers may be synchronized through the use of interrupts. 
The interrupt used is first selected by using the DMA interrupt enable regis- 
ter (subsection 3.1.8 on page 3-8). 

Table 9-5 (page 9-1 5) describes the relationship between the SYNC MODE 
bits of the DMA channel control register and the four synchronization mech- 
anisms performed: 

□ No synchronization (SYNC MODE - 0 O2) 

□ Source synchronization (SYNC MODE = 0 1 2 ) 

□ Destination synchronization (SYNC MODE = 1 O2) 

□ Source and destination synchronization (SYNC MODE = 1 I2) 

If the DMA split mode is selected, bits 6 and 7 of the DMA channel control 
register (page 9-15) are used to control channel synchronization: 

□ bit 6 controls primary channel synchronization 

□ bit 7 controls auxiliary channel synchronization 

No Synchronization (SYNC MODE = 0 0 2 ) 

When SYNC MODE = 0 O2, no synchronization is performed. The DMA per- 
forms reads and writes whenever it has the priority to use the DMA bus. 
All interrupts are ignored and, therefore, are considered to be globally 
disabled. However, no bits in the DMA interrupt enable register are 
changed. Figure 9-24 shows the synchronization mechanism when SYNC 
MODE = 0 0 2 . 

If an external interrupt is used for DMA interrupt synchronization, the exter- 
nal pin must be configured as a DMA interrupt pin (the DMA interrupt enable 
register is explained in subsection 3.1.8 on page 3-8 ). 



Figure 9-24. No DMA Synchronization 
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Source Synchronization (SYNC MODE = 012) 

When SYNC MODE = 0 1 2, the DMA coprocessor is synchronized to the 
source (see Figure 9-25). A read will not be performed until an interrupt is 
received by the DMA coprocessor (this also applies to the DMA primary 
channel in split mode as shown in Figure 9-25). Then, all DMA interrupts 
are disabled globally. However, no bits in the DMA interrupt enable register 
are changed. 



Figure 9-25. DMA Source Synchronization 
(a) DMA channel in unified mode 



(b) Primary channel in split mode 
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Destination Synchronization (SYNC MODE = 1 0 2 ) 

When SYNC MODE = 1 O2, the DMA coprocessor is synchronized to the 
destination in unified mode. First, all interrupts are ignored until the read is 
complete. (Though the DMA interrupts are considered to be globally dis- 
abled, no bits in the DMA interrupt enable register are changed.) A write will 
not be performed until an interrupt is received by the DMA coprocessor. 
Figure 9-26 shows the synchronization mechanism when SYNC MODE = 
1 O2 in unified mode. 

For the auxiliary channel in split mode, synchronization is similar to primary 
channel synchronization. The exception is that for the primary channel, the 
data is read from memory and written to a communication portoutput FIFO 
(shown on the right side of Figure 9-26). The auxiliary channel can read 
from a communication channel and write data to a memory address. 



Figure 9-26. DMA Destination Synchronization 
(a) Unified mode 



(b) Auxiliary channel in split mode 
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Source and Destination Synchronization (SYNC MODE = 11 2) 

When SYNC MODE = 1 1 2, a read is performed when a read interrupt is re- 
ceived, and a write is performed on the write interrupt. If a write interrupt is 
received before a read interrupt, the write interrupt is latched and the DMA 
data write won't be executed until the read is completed. If DMA split mode 
is selected, it reacts as two independent synchronizations for the primary 
and auxiliary channels. Source and destination synchronization when 
SYNC MODE = 1 1 2 is shown in Figure 9-27. 



Figure 9-27. DMA Source and Destination Synchronization 
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9.10 TMS320C40 Timers 



The TMS320C40 timer modules are general-purpose, 32-bit, timer/event 
counters, with two signaling modes and internal or external clocking (see 
Figure 9-28). The timer modules can be used to signal to the TMS320C40 
or to the external world at specified intervals, or to count external events. 
With an internal clock, the timer can be used to signal an external A/D con- 
verter to start a conversion, or it can interrupt the TMS320C40 DMA control- 
ler to begin a data transfer. With an external clock, the timer can count ex- 
ternal events and interrupt the CPU after a specified number of events. 
Available to each timer is an I/O pin that can be used as an input clock to 
the timer, an output clock signal, or a general-purpose I/O pin. 



Figure 9-28. Timer Block Diagram 



Period Register (31-0) 



Counter (32-bit) 



Counter Register 
(31-0) 



32 



Comparator 
Period = Counter ? 



Internal Clock/2 

External Clock 
INV 



32 



Pulse Generator 




TSTAT 



Timer Out 



Three memory-mapped registers are used by each timer: 

□ Global-control register 

□ Period register 

□ Counter register 
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The timer global-control register determines the operating mode of the 
timer, monitors the timer status, and controls the function of the I/O pin of 
the timer. The period register specifies the timer's signaling frequency. The 
timer counter register contains the current value of the incrementing 
counter. The timer can be incremented on the rising edge or the falling edge 
of the input clock. The counter is zeroed whenever its value equals that in 
the period register. The pulse generator generates two types of external 
clock signals: pulse or clock. The memory map for the timer modules is 
shown in Figure 9-29. 

Figure 9-29. Memory-Mapped Timer Locations 
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9.10.1 Timer Global-Control Register 

The timer global control register is a 32-bit register that contains the global 
and port control bits for the tinner module. Table 9-1 0 defines the register 
bits, names, and functions. Bits 3-0 are the port control bits; bits 
11 - 6 are the timer global control bits. Figure 9-30 shows the 32-bit regis- 
ter. Note that at reset, all bits are set to 0 except for DATIN (set to the value 
read on TCLK). 

Figure 9-30. Timer Global-Control Register 
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NOTE: xx = reserved bit, read as 0. 
R = read, W = write. 
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Table 9-10. Timer Global-Control Register Bits Summary 



Bits 


Name 


Function 


o 


FUNC 


FUNC controls the function of TCLK. If FUNC = 0, TCLK is configured as a general- 
purpose digital I/O port. If FUNC = 1 , TCLK is configured as a timer pin (see 
Figure 9-33 for a description of the relationship between FUNC and CLKSRC). 


1 


I/O 


If FUNC = 0 and CLKSRC = 0, TCLK is configured as a general-purpose I/O pin. In 
this case, if I/O = 0, TCLK is configured as a general-purpose input pin. If I/O = 1 , TCLK 
is configured as a general-purpose output pin. 


2 


DATOUT 


DATOUT drives TCLK when the TMS320C40 is in I/O port mode. DATOUT can also 
be used as an input to the timer. 


3 


DATIN 


Data input on TCLK or DATOUT A write has no effect. 


4-5 


Reserved 


Read as 0. 


D 


vavj 


The GO bit resets and starts the timer counter. When GO = 1 and the timer is not held, 
the counter is zeroed and begins incrementing on the next rising edge of the timer input 
clock. The GO bit is cleared on the same rising edge. GO = 0 has no effect on the timer. 
Table 9-1 1 further defines these bits. 


7 


HLD 


Counter hold signal. When this bit is zero, the counter is disabled and held in its current 
state. If the timer is driving TCLK, the state of TCLK is also held. The internal divide-by- 
two counter is also held so that the counter can continue where it left off when HLD 
is set to 1 . The timer registers can be read and modified while the timer is being held. 
RESET has priority over HLD. Table 9-1 1 shows the effect of writing to GO and HLD. 


8 


C/P 


Clock/pulse mode control. When C/P = 1 , clock mode is chosen, and the signaling of 
the status flag and external output will have a 50 percent duty cycle. When C/P = 0, 
the status flag and external output will be active for one H1 cycle during each timer 
period (see Figure 9-31 ). 


9 


CLKSRC 


opvunies ine source 01 ine Timer ciock. vvnen UL^ono = i , an internal ciock wiin ire- 
quency equal to one-half the H1 frequency is used to increment the counter. The INV 
bit has no effect on the internal clock source. When CLKSRC = 0, an external signal 
from the TCLK pin can be used to increment the counter. The external clock is synchro- 
nized internally, thus allowing external asynchronous clock sources that do not exceed 
the specified maximum allowable external clock frequency. This will be less than 
f(H1)/2. (See Figure 9-33 for a description of the relationship between FUNC and 
CLKSRC). 


10 


INV 


Inverter control bit. If an external clock source is used and INV = 1 , the external clock 

ic inv/prtpH it oopc into tho 001 intor If tho oi itoi it of tho 01 iIqo npnorator iq routpri to 

Id II IV CI IVU OO 11 ^UCo II 1 l\J 11 IV IsVJUl 11 VI . II U IV UU IUU I Ul 11 IV ijuiov y VI IVI dlVJI lo 1 V/U IVvJ Iv 

TCLK and INV = 1 , the output is inverted before it goes to TCLK (see Figure 9-28.). 
If INV = 0, no inversion is performed on the input or output of the timer. The INV bit 
has no effect, regardless of its value, when TCLK is used in I/O port mode. 


11 


TSTAT 


This bit indicates the status of the timer. It tracks the output of the uninverted TCLK 
pin. This flag sets a CPU interrupt on a transition from 0 to 1 . A write has no effect. 


12-31 


Reserved 


Read as 0. 
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Table 9-1 1 shows the result of a write using specified values of the GO and 
HLD bits in the timer global control register. 

Table 9-11. Resultofa Write of Specified Values of GO and HLD 



GO 
(Bit 6) 


HLD 
(Bit?) 


Result 


0 


0 


All timer operations are held. No reset is performed. 


0 


1 


Timer proceeds from state before write. 


1 


0 


All timer operations are held, including zeroing of the counter. The GO 
bit is not cleared until the timer is taken out of hold. 


1 


1 


Timer resets and starts. 



9 
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9.10.2 Timer Period and Counter Registers 

The 32-bit timer period register is used to specify the frequency of the timer 
signaling. The timer counter register is a 32-bit register that is reset to zero 
whenever it increments to the value of the period register. Both registers are 
set to 0 at reset. The locations of the registers are shown in Figure 9-29 
on page 9-46. 

Certain boundary conditions affect timer operation, such as a zero in the 
period register and an overflow of the counter. These conditions are listed 
as follows: 

□ When the period and counter registers are zero, the operation of the 
timer is dependent upon the C/P mode selected. In pujse mode 
(C/P = 0), TSTAT is set and remains set. In clock mode (C/P = 1), the 
width of the cycle is 2/f(H1), and the external clocks are ignored. 

□ When the counter register is not 0 and the period register = 0, the count- 
er will count, roll over to 0, and then behave as described immediately 
above (for both period and counter registers being zero). 

□ When the counter register is set to a value greater than the period 
register, the counter may overflow when being incremented. Once the 
counter reaches its maximum 32-bit value (OFFFF FFFFh), it simply 
clocks over to 0 and continues. 

Writes from the peripheral bus override register updates from the counter 
and new status updates to the control register. 

9.10.3 Timer Pulse Generation 

The timer pulse generator (see Figure 9-28) can generate several different 
external signals. These signals may be inverted with the INV bit. The two 
basic modes are pulse mode and clock mode, as shown in Figure 9-31 . In 
both modes, an internal clock source has a frequency of f(H1)/2, and an 
external clock source has a maximum frequencyof less than f (H1 )/2. Refer 
to timer timing in Chapter 14. In pulse mode (C/P = 0), the width of the pulse 
is1/f(H1). 
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Figure 9-3 1. Timer Timing 



I I 



-2/f(H1) 
-1/f(H1) 



4- 1/f(CLKSRC) 1 



I 

H period register/f(CLKSRC) 



(a) TSTAT and Timer Output (INV = 0) When C/P = 0 (Pulse Mode) 



4- 1/f(CLKSRC) 
-J— 2/f(H1) 



■ period register/f(CLKSRC) 



• 2 x period register/f(CLKSRG) 



r 



(b) TSTAT and Timer Output (INV = 0) When C/P = 1 (Clock Mode) 



The rate of timer signaling is determined by the frequency of the timer input 
clock and the period register. The following equations are valid with either 
an internal or an external timer clock: 

f(pulse mode) = f(timer clock) / period register 

f(clock mode) = f(timer clock) / (2 x period register) 



9-51 



TMS320C40 Timers 



9.10.4 Timer Operation Modes 



□ The timer can receive its input and send its output in several different 
modes, depending upon the setting of CLKSRC, FUNC, and I/O. The 
four timer modes of operation are defined as follows: 

□ If CLKSRC = 1 and FUNC = 0, the timer input comes from the internal 
clock. The internal clock is not affected by the INV bit (bit 10 as shown 
in Figure 9-30 on page 9-47). In this mode, TCLK is connected to the 
I/O port control and can be used as a general-purpose I/O pin (see 
Figure 9-32). If I/O = 0, TCLK is configured as a general-purpose input 
pin whose state can be read in DATIN. DATOUT has no effect on TCLK 
or DATIN. If 1/0=1, TCLK is configured as a general-purpose output 
pin. DATOUT is placed on TCLK and can be read in DATIN. 



Figure 9-32. Timer I/O Port Configurations 



Internal 



External 




DATIN 
T/O = 0 



(a) 



Internal 



External 



DATOUT 



TCLK 



DATIN 
1/0 = 1 
(b) 
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□ If CLKSRC = 1 and FUNC = 1 , the timer input comes from the internal 
clock, and the timer output goes to TCLK. This value may be inverted 
by using INV, and the value output on TCLK can be read in DATIN. 

□ If CLKSRC = 0 and FUNC = 0, the timer is driven according to the status 
of the I/O bit. If I/O = 0, the timer input comes from TCLK. This value can 
be inverted by using INV, and the value of TCLK can be read in DATIN. 
If 1/0 = 1, TCLK is an output pin. Then, TCLK and the timer are both driv- 
en by DATOUT. All 0-to-1 transitions of DATOUT increment the counter. 
INV has no effect on DATOUT. The value of DATOUT can be read in 
DATIN. 

□ If CLKSRC = 0 and FUNC = 1 , TCLK drives the timer. If INV - 0, all 0-to-1 
transitions of TCLK increment the counter. If INV = 1 , all 1 -to-0 transi- 
tions of TCLK increment the counter. The value of TCLK can be read 
in DATIN. 

Figure 9-33 shows the four timer modes of operation. 
Figure 9-33. Timer Modes as Defined by CLKSRC and FUNC 



Timer 



Timer In 
Timer Out 



Internal 

Internal 
Clock 



TSTAT 



I/O Port 
Control 



External 



►TCLK 



CLKSRC = 1 (Internal) 
FUNC = 0 (I/O Pin) 



Timer 



Timer In 
Timer Out 



T 

TSTAT 



Internal 

Internal 
Clock 



External 



TCLK 



DATIN 



CLKSRC = 1 (Internal) 
FUNC = 1 (Timer Pin) 



(a) 



(b) 



Timer 



Internal 



Timer In 
Timer Out 




^ 1 ^ 




i 


1 
1 






r 




TSTAT 


I/O Port 
Control 





External 
TCLK 



CLKSRC = 0 (External) 
FUNC = 0 (I/O Pin) 



(c) 



Timer 



Internal 



Timer In 
Timer Out 



T 

TSTAT 



External 
i- TCLK 



DATIN 



CLKSRC = 0 (External) 
FUNC = 1 (Timer Pin) 



(d) 
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Chapter 10 

Pipeline Operation 



Two characteristics of the TMS320C40 that contribute to its high perform- 
ance are pipelining and concurrent I/O and CPU operation. 

Five functional units control TMS320C40 operation: fetch, decode, read, ex- 
ecute, and DMA. Pipelining is the overlapping or parallel operations of the 
fetch, decode, read, and execute levels of a basic instruction. 

By performing input/output operations, the DMA coprocessor reduces the 
need for the CPU to do so, thereby decreasing pipeline interference and en- 
hancing the CPU's computational throughput. 

Major topics discussed in this chapter are as follows: 



Section Page 

10.1 Pipeline Structure 10-2 

10.2 Pipeline Conflicts 10-4 

■ Branch Conflicts 10-4 

■ Register Conflicts 10-8 

■ Memory Conflicts 10-11 

10.3 Resolving Memory Conflicts 10-18 

10.4 Clocking of Memory Accesses 10-20 

■ Program Fetches 1 0-20 

■ Data Loads and Stores 10-21 
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10.1 Pipeline Structure 



The five major units of the TMS320C40 pipeline structure and their functions 
are as follows: 



Fetch Unit (F) 
Decode Unit (D) 

Read Unit (R) 
Execute Unit (E) 



Fetches the instruction words from 
memory and updates the program counter 
(PC). 

Decodes the instruction word and performs 
address generation. Also controls any 
modification of the auxiliary registers and 
the stack pointer. 

If required, reads the operands from 
memory. 

If required, reads the operands from the 
register file, performs the necessary opera- 
tion, and writes results to the register file. If 
required, results of previous operations are 
written to memory. 

DMA Coprocessor (DMA) Reads and writes memory. 

A basic instruction has four levels: fetch, decode, read, and execute. 
Figure 1 0-1 illustrates these four levels of the pipeline structure. The levels 
are indexed according to instruction and execution cycle. The perfect over- 
lap in the pipeline, where all four units operate in parallel, occurs at cycle 
(m). Those levels about to be executed are at m +1 , and those just executed 
are at m-1 . The TMS320C40 pipeline control allows a high-speed execu- 
tion rate of one execution per cycle. It also manages pipeline conflicts so that 
they are transparent to the user. You do not need to take any special precau- 
tions to guarantee correct operation. 
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Figure 10-1. TMS320C40 Pipeline Structure 



Pipeline Structure 



D | R 



CYCLE | F 

m-3 w 

m-2 x w - - 

m-1 y x w 

m Z Y X W 

m+1 - z y x 

m+2 — - z y 

m+3 z 

Notes: 1) W, X, Y, and Z represent instructions. 

2) F, D, R, E = fetch, decode, read, and execute, respectively. 



Perfect overlap 



Priorities from highest to lowest have been assigned to each of the function- 
al units as follows: 

□ DMA (if configured as highest priority) 

□ Execute 

□ Read 

□ Decode 

□ Fetch 

□ DMA (if configured as lowest priority). 

When the processing of an instruction is ready to pass to the next higher 
pipeline level, but that level is not ready to accept a new input, a pipeline con- 
flict occurs. In this case, the lower priority unit waits until the higher priority 
unit completes its currently executing function. 
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10.2 Pipeline Conflicts 

The pipeline conflicts of the TMS320C40 can be grouped into the following 
main categories: 

Branch Conflicts Involve most of those instructions or operations 

that read and/or modify the PC. 

Register Conflicts Involve delays that can occur when reading from 

or writing to registers that are used for address 
generation. 

Memory Conflicts Occur when the internal units of the 

TMS320C40 compete for memory resources. 

Each of these three types is discussed in the following sections. Examples 
are included. Note in these examples, when data is refetched or an opera- 
tion is repeated, the symbol representing the stage of the pipeline is ap- 
pended with a number. For example, if a fetch is performed again, the initial 
fetch is labeled F1 and the refetch is labeled F2. Wh en an access is detained 
multiple cycles because of a not ready, the symbols RDY and RDY are used 
to indicate not ready and ready, respectively. 

10.2.1 Branch Conflicts 

10.2.1.1 Standard Branches 

The first class of pipeline conflicts occurs with standard (nondelayed) 
branches, i.e., BR, Bcond, DBcond, CALL, IDLE, RPTB, RPTS, RETIcond, 
RETSconcf, interrupts, and reset. Conflicts arise with these instructions and 
operations because during their execution, the pipeline is used only for the 
completion of the operation; other information fetched into the pipeline is 
discarded or refetched, or the pipeline is inactive. This is referred to as flush- 
ing the pipeline. Flushing the pipeline is necessary in these cases to guaran- 
tee that portions of succeeding instructions do not inadvertently get partially 
executed. TRAPcondand CALLconcfare classified differently from the oth- 
er types of branches and are considered later. 

Example 10-1 shows the code and pipeline operation for a standard 
branch. Note that one dummy fetch is performed (F1), and then after the 
branch address is available, a new fetch (F2) is performed. This dummy 
fetch affects the cache. 
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Example 10-1. Standard Branch 



BR THREE 

MPYF 

ADD 

SUBF 

AND 



; Unconditional branch 
; Not executed 
; Not executed 
; Not executed 
; Not executed 



THREE OR 
STI 



; Fetched after BR is fetched 



PIPELINE OPERATION 



PC 



BR 




Fetch held for 
new PC value 



THREE -> PC 



OR (nop) (nop) (nop) 



STI OR (nop) (nop) 



RPTS and RPTB both flush the pipeline, allowing the RS, RE, and RC regis- 
ters to be loaded at the proper time relative to the flow of the pipeline. If these 
registers are loaded without the use of RPTS or RPTB, no flushing of the 
pipeline occurs. If none of the repeat modes are being used, RS, RE, and 
RC may be used as general-purpose 32-bit registers without any pipeline 
conflicts occurring. In cases such as the nesting of RPTB due to nested in- 
terrupts, it may be necessary to load and store these registers directly while 
using the repeat modes. Since up to four instructions can be fetched before 
entering the repeat mode, loads should be followed by a branch to flush the 
pipeline. If the RC is changing when an instruction is loading it, the direct 
load takes priority over the modification made by the repeat mode logic. 
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10.2.1.2 Delayed Branches 

Delayed branches are implemented to guarantee the fetching of the next 
three instructions. The delayed branches include BRD, BcondD, and 
DBcondD. Example 10-2 shows the code and pipeline operation for a 
delayed branch. 

Example 10-2. Delayed Branch 

BRD THREE 
MPYF 
ADD 
SUBF 
AND 



THREE 


MPYF 






Fetched after 


SUBF is fetched 






PIPELINE OPERATION 




PC 






I D 


| R | E 


I 


n 




BRD 








n+1 




MPYF 


BRD 




No execute delay 


n+2 




ADDF 


MPYF 


BRD 




n+3 




SUBF 


ADDF 


MPYF BRD 


___ THREE -> PC 


THREE - 




MPYF 


SUBF 


ADDF MPYF 





; Unconditional delayed branch 

; Executed 

; Executed 

; Executed 

; Not executed 
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10.2.1.3 Delayed Branches With Annul Option 

In addition to standard and delayed branches, the 'C40 supports delayed 
branches with an annulling option. These instructions include BcondAT 
(branch conditional, annul if true) and BcondAF(branch conditional, annul 
if false). The status of the condition (whether the cond specif ed is found true 
or false) controls whether or not a branch is performed (as in a delayed 
branch). The annulling operation cancels the effect of any operation per- 
formed in the execute phase of the three instructions following the BcondAT 
or BcondAF. 

□ If the condition is true, BcondAT annuls the effect of any operation per- 
formed in the execute phase of the three instructions that follow. 

□ If the condition is false, BcondAF annuls the effect of any operation per- 
formed in the execute phase of the three instructions that follow. 

Example 10-3 uses both BcondAT and BcondAF. 

Example 10-3. Using BcondAF and BcondAT Instructions 



top : 



LDI 

BNEGAT 
ADD I 
MPYF 
NOT 
SUBF 



*AR1,R0 

bottom 

*++AR2,R3 



If negative, branch and 
annul the execute phase 
of ADDI, MPYF, and NOT. 
Otherwise, don' t annul and 
continue with SUBF. 



bottom: 



SUBI 

BNNAF 

ADDI 

MPYF 

NOT 

XOR 



1,R0 
top 

*++AR2,R3 



If not negative, branch and 
do not annul the execute 
phase of ADDI, MPYF, and 
NOT. Otherwise, annul ADDI, 
MPYF, and NOT, and continue 
with XOR. 



At the start of Example 10-3, if the result of the load is negative (a true 
condition), the BNEGAT instruction causes a branch and also an annulment 
of the execute phase of the three instructions that follow it. As a result, the 
execute phase of the ADDI instruction does not occur, and register R3 is not 
updated by addition. However, the incrementing of AR2 and the reading of 
the data at the corresponding address do occur because these operatons 
are in the decode and read phases of the pipeline, respectively, and thus are 
not annullable. 
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In short, operations that are annullable are: 

□ all writes to the register file that occur in the execute phase (ADDs, LDs, 
etc., but do not include LDA, LDPK, etc.). 

□ all stores to memory. 

10.2.2 Register Conflicts 

Register conflicts involve the reading or writing of registers used for ad- 
dressing purposes. These registers are AR0-AR7, IRO, IR1, BK, DP, and 
SP. These conflicts occur when the pertinent register is not ready to be used. 
If an instruction writes to one of these registers, the decode unit cannot use 
that same register until the write is complete, i.e., until instruction execution 
is completed. 

In Example 1 0-4, an auxiliary register is loaded, and the same auxiliary reg- 
ister is used on the next instruction. Since the decode stage needs the result 
of the write to the auxiliary register, the decode of this second instruction is 
delayed two cycles. Every time the decode is delayed, a refetch of the pro- 
gram word is performed; i.e., the first fetch of ADDF is at F1 , followed by F2 
and F3 (the final fetch). Since these are actual refetches, they can cause 
not only conflicts with the DMA controller but also cache hits and misses. 
(If a different AR register was used in the MPYF instruction (than was used 
in the LDI instruction), no delay would occur.) 
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Example 10-4. Write to an AR Followed by an AR for Address Generation 

NEXT 



LDI 
MPYF 
ADDF 
FLOAT 



7,AR2 
*AR2 , RO 



; 7 ~> AR2 

; Decode delayed 2 cycles 



PIPELINE OPERATION 



PC 


1 F 


D 


n 


LDI 




n+1 


MPYF 


LDI 


n+2 


ADDF 


MPYF 


n+2 


ADDF 


MPYF 


n+2 


ADDF 


MPYF 


n+3 


FLOAT 


ADDF 



Decode/address 
generation held 
for a new AR value 



LDI 



(nop) ldi 7,ar2 AR2 loaded 



(nop) 



The case for reads of these registers is similar to the case for writes. If an 
instruction must read registers AR0-AR7 or SP, the use of those particular 
registers by the decode for the following instruction is delayed until the read 
is complete/The registers are read at the start of the execute cycle and 
therefore require only a one-cycle delay of the following decode. For four 
registers (IRO, IR1 , BK, or DP), no delay is incurred upon a read. 
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In Example 10-5, two auxiliary registers are added together with the result 
going to an extended-precision register. The next instruction uses one of the 
same auxiliary registers as an address register. If the MPYF instruction used 
an AR register other than ARO or AR2, no delay would occur. 

Example 10-5. A ReadofARs Followed by ARs for Address Generation 

ADD I AR0,AR2,R1 ; ARO + AR2 -» Rl 

NEXT MPYF *++AR2,R0 ; Decode delayed 1 cycle 

ADDF 
FLOAT 



PC 

n 

n+1 
n+2 
n+2 
n+3 




ADDF MPYF ADD I 



ADDF MPYF 



Decode/address 
generation held 
until AR is read 

ARs read 



(nop) ADD I AR0,AR2,R1 
FLOAT ADDF MPYF (nop) 



The DBR (decrement and branch) instruction's use of auxiliary registers for 
loop counters is treated the same as if the use were for addressing. There- 
fore, the operation shown in the two previous examples can also occur for 
this instruction. 
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10.2.3 Memory Conflicts 

Memory conflicts can occur when the memory bandwidth of a physical 
memory space is exceeded. For example, RAM blocks 0 and 1 and the 
ROM block can support only two accesses every cycle. The external inter- 
face can support only one access per cycle. Some conditions under which 
memory conflicts can be avoided are discussed in Section 10.3. 

Memory pipeline conflicts consist of the following four types: 

Program Wait A program fetch is prevented from begin- 

ning. 

Program Fetch incomplete A program fetch has begun but is not yet 

complete. 

Execute Only An instruction sequence requires three 

CPU data accesses in a single cycle. 

Hold Everything A primary or expansion bus operation 

must complete before another one can 
proceed. 

These four types of memory conflicts are illustrated in examples and dis- 
cussed in the paragraphs that follow. 

Program Wait 

Two conditions can prevent the program fetch from beginning: 

□ The start of a CPU data access when 

■ Two CPU data accesses are made to an internal RAM or ROM 
block, and a program fetch from the same block is necessary. 

■ One of the external ports is starting a CPU data access, and a pro- 
gram fetch from the same port is necessary. 

□ A multicycle CPU data access or DMA data access over the external 
bus is needed. 



10 



10-11 



Pipeline Conflicts — Memory Conflicts 



Example 10-6 illustrates a program wait until a CPU data access 
completes. In this case, *AR0 and *AR1 are both pointing to data in RAM 
block 0, and the MPYF instruction will be fetched from RAM block 0. This 
results in the conflict shown. Since no more than two accesses can be made 
to RAM block 0 in a single cycle, the program fetch cannot begin and must 
wait until the CPU data accesses are complete. 

Example 10-6. Program Wait Until CPU Data Access Completes 

ADDF3 *AR0 , *AR1 , R0 

FIX 

MPYF 

ADDF3 

NEGB 

PIPELINE OPERATION 



PC 


I F 


I ° 




n 


ADDF3 






n+1 


FIX 


ADDF3 




n+2 


(wait) 


FIX 


ADDF3 


n+2 


MPYF 


(nop) 


FIX 


n+3 


ADDF3 


MPYF 


(nop) 


n+4 


NEGB 


ADDF3 


MPYF 



Fetch held until 
ARs are read 

ARs read 



ADDF3 *AR0,AR1,R0 



FIX 



(nop) 



Example 1 0-7 shows a program wait due to a multicycle data-data access 
or a multicycle DMA access. The ADDF, MPYF, and SUBF are fetched from 
some portion in memory other than the external port the DMA requires. The 
DMA begins a multicycle access. The prdgram fetch corresponding to the 
CALL is made to the same external port the DMA is using. 

Even if the DMA was configured as the lowest priority, a multicycle access 
cannot be aborted. The program fetch must therefore wait until the DMA 
access completes. 



10-12 



Pipeline Operation 



Pipeline Conflicts — Memory Conflicts 



Example 10-7. Program Wait Due to Multicycle Access 



PIPELINE OPERATION 
PC | F | D | R | E | 

n ADDF - - - 



n+1 


MPYF 


ADDF 








n+2 


SUBF 


MPYF 


ADDF 




t 

2-cycle DMA access 


n+3 


(wait) 


SUBF 


MPYF 


ADDF 


i_ 


n+3 


CALL 


(nop) 


SUBF 


MPYF 




n+4 




CALL 


(nop) 


SUBF 





Program Fetch Incomplete 

A program fetch incomplete occurs when a program fetch takes more than 
one cycle to complete due to wait states. In Example 10-8, the MPYF and 
ADDF are fetched from memory that supports single-cycle accesses. The 
SUBF is fetched from memory requiring one wait state. One example that 
demonstrates this conflict is a fetch across a bank boundary on the primary 
port. 

Example 10-8. Multicycle Program Memory Fetches 



PIPELINE OPERATION 



PC | F 

n MPYF 



n+1 ADDF 

n+2 RDY subf 

n+2 RDY subf 

n+3 add I 



MPYF 

ADDF MPYF 

(nop) ADDF MPYF 

SUBF (nop) ADDF 



T 

1 wait state required 

i 
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Execute Only 

The Execute Only type of memory pipeline conflict occurs when a sequence 
of instructions requires three CPU data accesses in a single cycle. There 
are two cases in which this occurs: 

□ An instruction performs a store and is followed by an instruction that 
does two memory reads. 

□ An instruction performs two stores and is followed by an instruction that 
performs at least one memory read. 

The first case is shown in Example 1 0-9. Since this sequence requires 
three data memory accesses and only two are available, only the execute 
phase of the pipeline is allowed to proceed. The dual reads required by the 
LDF || LDF is delayed one cycle. Note that a refetch of the next instruction 
can occur. 

Example 10-9. Single Store Followed by Two Reads 

STF R0,*AR1 ; R0 -» *AR1 

LDF *AR2,R1 ; *AR2 -» Rl in parallel with 
II LDF *AR3,R2 ; *AR3 -» R2 

PIPELINE OPERATION 

PC | F | D | R | E | 

n stf - 

n+1 ldfIIldf stf 

ldf 1 1 ldf stf - Write must 

^ complete 

w ldfIIldf stf ro,*ari before the 

2 reads can 

,, *f complete. 

W LDFIILDF (nop) r 

n+4 y x w ldfIIldf *ar2,ri and *ar3,r2 



n+2 w 
n+3 x 
n+4 x 
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Example 1 0-1 0 shows a parallel store followed by a single load or read. 
Since the two parallel stores are required, the next CPU data memory read 
must wait a cycle before beginning. One program memory refetch may 
occur. 

Example 10-10. Parallel Store Followed by Single Read 

STF R0,*AR0 ; R0 — » *AR0 in parallel with 

|| STF R2,*AR1 ; R2 -» *AR1 

ADDF 8SUM,R1 ; Rl + @SUM -> Rl 

IACK 

ASH 

PIPELINE OPERATION 

PC | F | D | R | E | 



n STF 1 1 STF - - - 

Read must wait 

n +1 ADDF STF 1 1 STF - - / the MlteS 

/ are complete/ 

n+2 IACK ADDF STF 1 1 STF /- / 

11+3 ASH IACK ADDF / STF 1 1 STF R0 f *AR0 and R2 , *AR1 

n+4 ASH IACK ADDF (nop) 

fl+4 - ASH IACK ADDF 
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Hold Everything 

There are three types of Hold Everything memory pipeline conflicts: 

□ A CPU data load or store cannot be performed because an external port 
is busy. 

□ An external load takes more than one cycle. 

□ Conditional calls and traps. 

The first type of HoJd Everything conflict occurs when one of the external 
ports is busy because of access that has started but is not complete. In 
Example 10-11, the first store is a two-cycle store. The CPU writes the data 
to an external port. The port control then takes two cycles to complete the 
data-data write. The LDF is a read over the same external port. Since the 
store is not complete, the CPU continues to attempt LDF until the port is 
available. 

Example 10-11. Busy External Port 

STF R0,@DMA1 
LDF @DMA2,R0 



PIPELINE OPERATION 
PC | F | D | R | E | 

n stf - - - 

n+1 LDF STF 

n+2 W LDF STF 

n+2 W LDF (nop) STF T 

2-cycle external bus 

n+2 w ldf (nop) (nop) ^ write access 

n+3 X H LDF (nop) 

n+4 Y X W LDF 



The second type of Hold Everything conflict involves multicycle data reads. 
The read has begun and continues until completed. In Example 10-12, the 
LDF is performed from an external memory that requires several cycles to 
complete. 
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Example 10-12. Multicycle Data Reads 

LDF @DMA,RO 



PIPELINE OPERATION 



PC 


I F I 


D | 




I E | 


n 


LDF 








n+1 


I 


LDF 






n+2 


J 


I 


LDF 


- T 

2-cycle external bus 


n+3 


K (dummy) 


I 


LDF 


i read access 


n+3 


K 2 


J 


I 


LDF 



The final type of Hold Everything conflict deals with conditional calls and 
traps, which are different from the other branch instructions. Whereas the 
other branch instructions are conditional loads, the conditional calls and 
traps are conditional stores, which take one more cycle than a conditional 
branch (see Example 10-13). The added cycle is used to push the return 
address after the call condition is evaluated. 



Example 10-13. Conditional Calls and Traps 







PIPELINE OPERATION 




PC 


I F 


I D I 






n 


CALL con d 








n+1 


I 


CALLcond 






n+1 


(nop) 


(nop) 


CALL cond 




n+1 


(nop) 


(nop) 


(nop) 


CALL c on d 


n+1 


(nop) 


(nop) 


(nop) 


CALLcond 


♦ 

n+2/CALLaddr 


I 


(nop) 


(nop) 


(nop) 



PC store 
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10.3 Resolving Memory Conflicts 

If program fetches and data accesses are performed in such a manner that 
the resources being used cannot provide the necessary bandwidth, the 
program fetch is delayed until the data access is complete. Certain 
configurations of program fetch and data accesses yield conditions under 
which the TMS320C40 can achieve maximum throughput. 

Table 1 0-1 shows how many accesses can be performed from the different 
memory spaces when it is necessary to do a program fetch and a single data 
access, and still achieve maximum performance (one cycle). Four cases 
achieve one-cycle maximization. 

Table 10-1. One Program Fetch and One Data Access for Maximum Performance 



Case 
No. 


Global Bus 
Accesses 


Accesses From 

Dual-Access 
Internal Memory 


Local Bus 
Or Peripheral 
Accesses 


1 


1 


1 




2 


1 




1 


3 




2 from any 
combination 
of internal memory 




4 




1 


1 
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Table 1 0-2 shows how many accesses can be performed from the different 
memory spaces when it is necessary to do a program fetch and two data 
accesses, still achieving maximum performance (one cycle). Six cases 
achieve this maximization. 

Table 10-2. One Program Fetch and Two Data Accesses for Maximum Performance 



Case 
No. 


Global Bus 
Accesses 


Accesses From Dual-Access 
Internal Memory 


Local Or 
Peripheral Bus 
Accesses 


1 


1 


2 from any combination of internal 
memory 




2T 


1 program 


1 Data 


1 data 


3t 


1 data 


1 Data 


1 program 


4 




2 from same internal memory 
block andl from a different inter- 
nal memory block 




5 




3 from different internal memory 
blocks 




6 




2 from any combination of internal 
memory 


1 


7 


1 program 


2 data 


1 DMA 


8 


1 DMA 


2 data 


1 program 



t For Cases 2 and 3, see Three-Operand Instruction Memory Reads on 
page 10-21. 
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10.4 Clocking of Memory Accesses 

Internal clock phases (H1 and H3) and their relationship to memory ac- 
cesses are discussed in this section to show how the TMS320C40 handles 
multiple memory accesses. Whereas the previous section discussed the in- 
teraction between sequences of instructions, this section discusses the flow 
of data on an individual instruction basis. 

Each major clock period of 40 ns is composed of two minor clock periods 
of 20 ns, labeled H3 and H1 (these times assume a 50-MHz 'C40). The ac- 
tive clock period for H3 and H1 is the time when that signal is high. 



<- Major Clock Period -H 



H1 



H3 



The precise operation of memory reads and writes can be defined according 
to these minor clock periods. The types of memory operations that can occur 
are program fetches, data loads and stores, and DMA accesses. 

10.4.1 Program Fetches 

Internal program fetches are always performed during H3 unless a single 
data store must occur at the same time because of another instruction in the 
pipeline. In this case, the program fetch occurs during H1 and the data store 
during H3. 

External program fetches always start at the beginning of H3 with the ad- 
dress being presented on the external bus. At the end of H1 , they are com- 
pleted with the latching of the instruction word. 
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10.4.2 Data Loads and Stores 

Four types of instructions perform loads, memory reads, and stores: two-op- 
erand instructions, three-operand instructions, multiplier/ALU operation 
with store instructions, and parallel multiply and add instructions. See Chap- 
ter 5 for detailed information on addressing modes. 

As discussed in Chapter 7, the number of bus cycles for external memory 
accesses differs in some cases from the number of CPU execution cycles. 
For external reads, the number of bus cycles and CPU execution cycles is 
identical. For external writes, there are always at least two bus cycles, but 
unless there is a port access conflict, there is only one CPU execution cycle. 
In the following examples, any difference in the number of bus cycles and 
CPU cycles is noted. 

Two-Operand Instruction Memory Accesses 

Figure 10-2. Two-Operand Instruction Word 



31 


24 23 




16 15 


87 


0 


0X0 


I I I I I 

Operation 


I 

G 


I i I I 

dst(src) 


i i i 


1 1 1 1 1 1 1 

src(dst) 


1 1 1™ I— 



Two-operand instructions include all those instructions with bits 31-29 be- 
ing OOO2 or 01 02 (see Figure 1 0-2). In the case of a data read, bits 1 5-0 rep- 
resent the srcoperand. Internal data reads are always performed during H1 . 
External data reads always start at the beginning of H3 with the address be- 
ing presented on the external bus, and they complete with the latching of 
the data word at the end of H1 . 

In the case of a data store, bits 1 5-0 represent the cfef operand. Internal data 
stores are performed during H3. External data stores always start at the 
beginning of H3 with the address and data being presented on the external 
bus. 

Three-Operand Instruction Memory Reads 

Figure 10-3. Three-Operand Instruction Word 
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1 
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Three-operand instructions include all instructions with bits 31-29 being 
001 2 (see Figure 10-3). The source operands, srd and src2, come from 
either registers or memory. When one or more of the source operands are 
from memory, these instructions are always memory reads. 
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If only one of the source operands is from memory (either srd or src2) and 
is located in internal memory, the data is read during H1. If the single 
memory source operand is in external memory, the read starts at the begin- 
ning of H3, with the address being presented on the external bus, and com- 
pletes with the latching of the data word at the end of H1 . 

If both source operands are to be fetched from memory, then several cases 
occur. If both operands are located in internal memory, the srd read is per- 
formed during H3 and src2 during H1 , thus completing two memory reads 
in a single cycle. 

If srd is in internal memory and src2 is in external memory, the src2 access 
begins at the start of H3 and latches at the end of H1 . At the same time, the 
srd access to internal memory is performed during H3. Again, two memory 
reads are completed in a single cycle. 

If srd is in external memory and src2 is in internal memory, two cycles are 
necessary to complete the two reads. In the first cycle, the internal src2 ac- 
cess is performed. The srd is also performed, but not latched until the next 
H3. 

If srd and src2 are both from external memory, two cycles are required to 
complete the two reads. In the first cycle, the srd access is performed and 
loaded on the next H3; in the second cycle, the src2 access is performed 
and loaded on that cycle's H1 . 



Operations with Parallel Stores 

Figure 10-4. Multiply or CPU Operation With a Parallel Store 
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The next class of instructions includes all instructions that have a store in 
parallel with another instruction. Bits 31 and 30 for these instructions are 
equal to 1 1 2- 

For operations that perform a multiply or ALU operation in parallel with a 
store, the instruction word format is shown in Figure 1 0-4. If the store opera- 
tion to dst2 is external or internal, it is performed during H3. Two bus cycles 
are required for external stores, but only one CPU cycle is necessary to 
complete the write. 

If the memory read operation is external, it starts at the beginning of H3 and 
latches at the end of H1. If the memory read operation is internal, it is 
performed during H1 . Note that memory reads are performed by the CPU 
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during the read (R) phase of the pipeline, and stores are performed during 
the execute (E) phase. 

The instruction word format for instructions that have parallel stores to 
memory is shown in Figure 10-5. If both destination operands, dst1 and 
dst2, are located in internal memory, dst1 is stored during H3 and ctet2dur- 
ing H1 , thus completing two memory stores in a single cycle. 



Figure 10-5. Two Parallel Stores 
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If dst 1 is in external memory and dst2 is in internal memory, the dst 1 store 
begins at the start of H3. The cfef2store to internal memory is performed dur- 
ing H1 . Two bus cycles are required forthe external store, but only one CPU 
cycle is necessary to complete the write. Again, two memory stores are 
completed in a single cycle. 

If dst1 is in internal memory and dst2 is in external memory, an additional 
bus cycle is necessary to complete the dst2 store. Only one CPU cycle is 
necessary to complete the write, but the port access requires three bus 
cycles. In the first cycle, the internal dst1 store is performed during H3, and 
dst2 is written to the port during H1 . During the next cycle, the dst2 store is 
performed on the external bus, beginning in H3, and executes as normal 
through the following cycle. 

If dst1 and dst2are both written to external memory, a single CPU cycle is 
still all that is necessary to complete the stores. In this case, four bus cycles 
are required. 

1 ) In the first cycle, both dst 1 and dst2 are written to the port, and the exter- 
nal bus access for dst1 begins. 

2) The store for dst1 is completed on the second cycle, and the store for 
dst2 begins on the third external bus cycle. 

3) Finally, the store for dst2 is completed on the fourth external bus cycle. 
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Parallel Multiplies and Adds 

Memory addressing for parallel multiplies and adds is similar to that for 
three-operand instructions. The parallel multiplies and adds include all in- 
structions with bits 31-30 equal to 1 O2 (see Figure 1 0-6). 

Figure 10-6. Parallel Multiplies and Adds 

31 24 23 16 15 87 0 
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For these operations, src3 and src4 are both located in memory. If both oper- 
ands are located in internal memory, src3 is performed during H3, and src4 
is performed during H1 , thus completing two memory reads in a single cycle. 

If src3 is in internal memory and src4 is in external memory, the src4 access 
begins at the start of H3 and latches at the end of H1 . At the same time, the 
src3 access to internal memory is performed during H3. Again, two memory 
reads are completed in a single cycle. 

If src3 is in external memory and src4 is in internal memory, two cycles are 
necessary to complete the two reads. In the first cycle, the internal src4 ac- 
cess is performed. During the H3 of the next cycle, the src3 access is per- 
formed. 

If src3 and src4 are both from external memory, two cycles are necessary 
to complete the two reads. In the first cycle, the src3 access is performed; 
in the second cycle, the src4 access is performed. 



10-24 



Pipeline Operation 



Chapter 11 



Assembly Language Instructions 




The TMS320C40 assembly language instruction set supports 
numeric-intensive, signal processing, and general-purpose applications. 
The instructions are organized into these major groups: load-and-store, 
two- or three-operand arithmetic/logical, parallel, program control, and 
interlocked operations instructions. The addressing modes used with the 
instructions are described in Chapter 5. 

The TMS320C40 instruction set can also use one of 20 condition codes with 
any of the 10 conditional instructions, such as LDFcond. This chapter 
defines the condition codes and flags. 

The assembler allows optional syntax forms to simplify the assembly 
language for special-case instructions. These optional forms are listed and 
explained. 

Each of the individual instructions is described and listed in alphabetical 
order. An example instruction (on pages 11-15 through 11-17) 
demonstrates the special format used and explains its content. 



This chapter discusses the following major topics: 

Section Page 

11.1 Instruction Set 11-3 

■ Load-and-Store Instructions 11-3 

■ Two-Operand Arithmetic/Logical Instructions 11-4 

■ Three-Operand Arithmetic/Logical Instructions 11-6 

■ Program Control Instructions 11-6 

■ Interlocked Operations Instructions 11-7 

■ Parallel Operations Instructions 11-8 
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Section Page 

1 1 .2 Condition Codes and Flags 11-10 

1 1 .3 Individual Instructions 11-13 

■ Symbols and Abbreviations Used in Instructions 11-13 

■ Optional Assembler Syntaxes 11-15 

■ Individual instruction descriptions, alphabetized 
(includes syntax, operation, operands, encoding, 
description, cycles, status bits, mode bit, examples) . 11-17 
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^ Assembly Language Instructions — Instruction Set 

11.1 Assembly Language Instructions — Instruction Set 

The TMS320C40 instruction set is exceptionally well-suited to digital signal 
processing and other numeric-intensive applications. All instructions are a 
single machine word long, and most instructions take a single cycle to ex- 
ecute. In addition to multiply and accumulate instructions, the TMS320C40 
possesses a full complement of general-purpose instructions. 

The instruction set contains 135 instructions organized into the following 
functional groups: 

□ Load-and-store 

□ Two-operand arithmetic/logical 

□ Three-operand arithmetic/logical 

□ Program control 

□ Interlocked operations 

□ Parallel operations 

Each of these groups is discussed in the succeeding subsections. 

11.1.1 Load-and-Store Instructions 

The TMS320C40 supports 23 load-and-store instructions (see Table 11-1). 
These instructions can 

□ Load a word from memory into a register, 

□ Store a word from a register into memory, or 

□ Manipulate data on the system stack. 

Two of these instructions can load data conditionally. This is useful for locat- 
ing the maximum or minimum value in a data set. See Section 12.2 for de- 
tailed information on condition codes. 
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Table 1 1-1. Load-and-Store Instructions 



Instruction 


Description 


LBb 


Load byte (signed) 


LBUb 


Load byte (unsigned) 


LDA 


Load address register 


LDE 


Load floating-point exponent 


LUbr 


Load integer, expansion file register 
to primary register 


LDF 


Load floating-point value 


LDFcond 


Load floating-point value 


LDHI 


Load 1 6-bit unsigned immediate 
into16MSBs 


LDI 


Load integer 


LDIcond 


Load integer conditionally 


LDM 


Load floating-point mantissa 


LDPE 


Load integer, primary register to 
expansion file register 



Instruction 


Description 


LDPK 


Load DP register immediate 


LHW 


Load half-word signed 


LHUw 


Load half-word unsigned 


LWLcf 


Load word left-shifted 


LWRcf 


Load word right-shifted 


POPF 


Pop floating-point value from stack 


PUSH 


Push integer on stack 


PUSHF 


Push floating-point value on stack 


STF 


Store floating-point value 


STI 


Store integer 


STIK 


Store integer immediate 



11 .1 .2 Two-Operand Instructions 

The TMS320C40 supports a complete set of 43 two-operand arithmetic and 
logical instructions. The two operands are the source and destination. The 
source operand may be a memory word, a register, or a constant. The desti- 
nation operand is always a register. 

These instructions provide integer, floating-point, or logical operations, 
and multiprecision arithmetic. Table 11-2 lists these instructions. 
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Table 11-2. Two-Operand Instructions 



Instruction 


Description 


A DOC 

ABSr 


Absolute value of a floating- 
point number 


ABSI 


Absolute value of an integer 


ADDCt 


Add integers with carry 


ADDFt 


Add floating-point values 


ADDIT 


Add integers 


ANDt 


Bitwise logical- AND 


ANDNt 


Bitwise logical- AND with 
complement 


ASHt 


Arithmetic shift 


CMPFt 


Compare floating-point values 


CMPlt 


Compare integers 


FIX 


Convert floating-point value to integer 


FLOAT 


Convert integer to floating-point value 


FRIEEE 


Convert IEEE floating-point format to 
twos-complement floating-point for- 


LSHt 


Logical shift 


MBcf 


Merge byte, left shifted 


MHcf 


Merge half-word, left shifted 


MPYFt 


Multiply floating-point values 


MPYlt 


Multiply integers 


MPYSHlt 


Multiply signed integer, 32-MSB 
product 


MPYUHIT 


Multiply unsigned integer, 32-MSB 
product 


NEGB 


Negate integer with borrow 


NEGF 


Negate floating-point value 



t Two- and three-operand versions 



Instruction 


Description 


NbCil 


Negate integer 


NORM 


Normalize floating-point value 


NOT 


Bitwise logical-complement 


ORt 


Bitwise logical-OR 


RCPF 


Reciprocal floating point 


RND 


Round floating-point value 


ROL 


Rotate left 


ROLC 


Rotate left through carry 


ROR 


Rotate right 


RORC 


Rotate right through carry 


RSQRF 


Reciprocal of square root, floating 
point 


SUBBt 


Subtract integers with borrow 


SUBC 


Subtract integers conditionally 


SUBFt 


Subtract floating-point values 


SUBlt 


Subtract integer 


SUBRB 


Subtract reverse integer with borrow 


SUBRF 


Subtract reverse floating-point value 


SUBRI 


Subtract reverse integer 


TOIEEE 


Convert twos complement to IEEE 
format 


TSTBt 


Test bit fields 


XORt 


Bitwise exclusive-OR 
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11.1.3 Three-Operand Instructions 

Most instructions contain two or three operands. The 19 three-operand in- 
structions allow the TMS320C40 to read two operands from memory or the 
CPU register file in a single cycle and store the results in a register. The fol- 
lowing differentiates the two- and three-operand instructions: 

□ Two-operand instructions have a single-source operand (or shift count) 
and a destination operand. 

□ Three-operand instructions may have two source operands (or one 
source operand and a count operand) and a destination operand. A 
source operand may be a memory word, a register or a constant. The 
destination of a three-operand instruction is always a register. 

Table 11-3 lists the instructions that have three-operand versions. Note 
that the 3 in the mnemonic can be omitted from three-operand instructions 
(see subsection 1 1 .3.2). 

Table 1 1-3. Three-Operand Instructions 



Instruction 


Description 


ADDC3 


Add with carry 


ADDF3 


Add floating-point values 


ADDI3 


Add integers 


AND3 


Bitwise logical- AND 


ANDN3 


Bitwise logical-AND with complement 


ASH3 


Arithmetic shift 


CMPF3 


Compare floating-point values 


CMPI3 


Compare integers 


LSH3 


Logical shift 



Instruction 


Description 


MPYF3 


Multiply floating-point values 


MPYI3 


Multiply integers 


MPYSHI3 


Multiply signed integer, 32-MSB 
product 


MPYUHI3 


Multiply unsigned integer, 32-MSB 
product 


OR3 


Bitwise logical-OR 


SUBB3 


Subtract integers with borrow 


SUBF3 


Subtract floating-point values 


SUBI3 


Subtract integers 


TSTB3 


Test bit fields 


XOR3 


Bitwise exclusive-OR 



11.1.4 Program Control Instructions 

The program-control instruction group consists of all of those instructions 
(23) that affect program flow. The repeat mode allows repetition of a block 
of code (RPTB and RPTBD) or of a single line of code (RPTS). Both stan- 
dard and delayed (single-cycle) branching are supported. Several of the 
program control instructions are capable of conditional operations (see Sec- 
tion 12.2 for detailed information on condition codes). Table 11-4 lists the 
program control instructions. 
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Table 1 1-4. Program Control Instructions 



Instruction 


Description 


Bcond 


Branch conditionally (standard) 


Uvwi /unr 


Branch conditionally delayed and 
annul if false 


BcondAJ 


Branch conditionally delayed and 
annul it true 




oi cii 11*1 i wwi luiiiui iciiiy ay is \jj 


BR 


Branch unconditionally (standard) 


BRD 


Branch unconditionally (delayed) 


CALL 


Call subroutine 


OALLcond 


Call subroutine conditionally 


DBcond 


Decrement and branch 


DBcondD 


Decrement and branch 


IDLE 


Idle until interrupt 


LAJ 


Link and jump 



Instruction 


Description 


LAJ cond 


Link and jump conditional 




Link and tran contitional 


NOP 


No operation 


RETIcond 


Return from interrupt conditionally 


RETIconcD 


Return from trap or interrupt, delayed 


RETSconof 


Return from subroutine conditionally 


RPTB 


Repeat block of instructions 


RPTBD 


Repeat block, delayed 


RPTS 


Repeat single instruction 


SWI 


Software interrupt 


TRAPcond 


Trap conditionally 



1 1 .1 .5 Interlocked Operations Instructions 

The interlocked operations instructions support multiprocessor communi- 
cation and the use of external signals to allow for powerful synchronization 
mechanisms. They also guarantee the integrity of the communication and 
result in a high-speed operation/Refer to Chapter 7 for examples of the use 
of interlocked instructions. 

Table 1 1-5. Interlocked Operations Instructions 



Instruction 


Description 


LDFI 


Load floating-point value, interlocked 


LDII 


Load integer, interlocked 


SIGI 


Signal, interlocked 



Instruction 


Description 


STFI 


Store floating-point value, interlocked 


STII 


Store integer, interlocked 
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11.1.6 Parallel Operations Instructions 

The parallel-operations instructions group makes a high degree of parallel- 
ism possible. Some of the TMS320C40 instructions can occur in pairs that 
will be executed in parallel. These instructions offer the following features: 

□ Parallel loading of registers, 

□ Parallel arithmetic operations, or 

□ Arithmetic/logical instructions used in parallel with a store instruction. 

Each instruction in a pair is entered as a separate source statement. The 
second instruction in the pair must be preceded by two vertical bars (||). 
Table 11-6 lists the valid instruction pairs. 

Table 1 1-6. Parallel Instructions 



Mnemonic 


Description 


Parallel Arithmetic with Store Instructions 


ABSF 
II STF 


Absolute value of a floating-point number and store floating-point value 


ABSI 
IISTI 


Absolute value of an integer and store integer 


ADDF3 
IISTF 


Add floating-point values and store floating-point value 


ADDI3 
IISTI 


Add integers and store integer 


AND3 
IISTI 


Bitwise logical-AND and store integer 


ASH3 
IISTI 


Arithmetic shift and store integer 


FIX 
II STI 


Convert floating-point to integer and store integer 


FLOAT 
IISTF 


Convert integer to floating-point value and store floating-point value 


FRIEEE 
IISTF 


Convert IEEE floating-point format and store 


LDF 
IISTF 


Load floating-point value and store floating-point value 


LDI 
II STI 


Load integer and store integer 



Table concluded on next page. 
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Table 1 1-6. Parallel Instructions (Concluded) 



Mnemonic 


Description 


Parallel Arithmetic with Store Instructions 


LSH3 
II STI 


Logical shift and store integer 


MPYF3 
II STF 


Multiply floating-point values and store floating-point value 


MPYI3 
|| STI 


Multiply integer and store integer 


NEGF 
II STF 


Negate floating-point value and store floating-point value 


NEGI 
II STI 


Negate integer and store integer 


NOT3 
II STI 


Complement value and store integer 


OR3 
|| STI 


Bitwise logical-OR value and store integer 


o I r 
II STF 


Store floating-point values 


GTI 

o n 
II STI 


Store integers 


SUBF3 
II STF 


Subtract floating-point value and store floating-point value 


TOIEEE 
II STF 


Convert to IEEE format and store 


SUBI3 
II STI 


Subtract integer and store integer 


X0R3 
II STI 


Bitwise exclusive-OR values and store integer 


Parallel Load Instructions 


LDF 
II LDF 


Load floating-point 


LDI 
II LDI 


Load integer 


Parallel Multiply and Add/Subtract Instructions 


MPYF3 
|| ADDF3 


Multiply and add floating-point 


MPYF3 
|| SUBF3 


Multiply and subtract floating-point 


MPYI3 
|| ADDI3 


Multiply and add integer 


MPYI3 
HSUBI3 


Multiply and subtract integer 
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11.2 Condition Codes and Flags 

The TMS320C40 provides 20 condition codes (00000 - 1 01 00, excluding 
01011) that can be used with any of the conditional instructions, such as 
RETScond or LDFcond. The conditions include signed and unsigned com- 
parisons, comparisons to zero, and comparisons based on the status of in- 
dividual condition flags. Note that all conditional instructions can also accept 
the suffix U to indicate unconditional operation. 

Seven condition flags provide information about properties of the result of 
arithmetic and logical instructions. The condition flags are stored in the sta- 
tus register (ST) and are affected by an instruction based upon the SET 
COND field (bit 15 of the status register). 

□ If SET COND = 0, the ST condition flags are set if the operation's target 
is any extended-precision register (R0-R11) . 

□ If SET COND = 1 , the ST condition flags are also set if the operaton's 
target is any register in the primary register file except the status 
register. 

□ The value of SET COND (0 or 1 ) does not affect the nature of the com- 
pare instructions (CMPF, CMPF3, CMPI, CMPI3, TSTB, or TSTB3). 

The condition flags may be modified by most instructions when either of the 
preceding conditions is established and either of the following two cases oc- 
curs: 

□ A result is generated when the specified operation is performed to infi- 
nite precision. This is appropriate for compare-and-test instructions that 
do not store results in a register. It is also appropriate for arithmetic in- 
structions that produce underflow or overflow. 

□ The output is written to the destination register as shown in Table 1 1 -7. 
This is appropriate for other instructions that modify the condition flags. 

Table 1 1-7. Output Value Formats 



Type of Operation 


Output Format 


Floating-point 


8-bit exponent, 1 sign bit, 31 -bit fraction 


Integer 


32-bit integer 


Logical 


32-bit unsigned integer 



Figure 11-1 shows the condition flags in the low-order bits of the status reg- 
ister. Following the figure is a list of status register condition flags and de- 
scriptions on how the flags are set by most instructions. For specific details 
of the effect of a particular instruction on the condition flags, see the de- 
scription of that instruction in subsection 1 1 .3.3. 
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Condition Codes and Flags 



Figure 1 1-1. Status Register 
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SETCOND | PGIE | GIE | CC | CE | CF | PCf| RM OVm| lUf [ IV | W N | Z \ V } 0 ] 



R/W R/W R/W R/W 

NOTE: xx = reserved bit. 

R = read, W = write. 



R/W R/W R/W R/W R/W R/W R/W R/W R/W 



LUF Latched Underflow Condition Flag. LUF is set whenever UF (floa- 
ting-point underflow flag) is set. LUF may be cleared only by a proces- 
sor reset or by modifying it in the status register (ST). 

LV Latched Overflow Condition Flag. LV is set whenever V (overflow 
condition flag) is set. Otherwise, it is unchanged. LV may be cleared 
only by a processor reset or by modifying it in the status register (ST). 

UF Floating-Point Underflow Condition Flag. A floating-point under- 
flow occurs whenever the exponent of the result is less than or equal 
to -1 28. If a floating-point underflow occurs, UF is set, and the output 
value is set to 0. UF is cleared if a floating-point underflow does not 
occur. 

N Negative Condition Flag. Logical operations assign N the state of 
the MSB of the output value. For integer and floating-point opera- 
tions, N is set if the result is negative, and cleared otherwise. Zero is 
positive. 

Z Zero Condition Flag. For logical, integer, and floating-point opera- 
tions, Z is set if the output is 0, and cleared otherwise. 

V Overflow Condition Flag. For integer operations, V is set if the re- 
sult does not fit into the format specified for the destination (i.e., - 2 32 
< result < 2 32 -1). Otherwise, V is cleared. For floating-point opera- 
tions, V is set if the exponent of the result is greater than 1 27; other- 
wise^ is cleared. Logical operations always clear V. 

C Carry Flag. When an integer addition is performed, C is set if a carry 
occurs out of the bit corresponding to the MSB of the output. When an 
integer subtraction is performed, C is set if a borrow occurs into the bit 
corresponding to the MSB of the output. Otherwise, for integer opera- 
tions, C is cleared. The carry flag is unaffected by floating-point and 
logical operations. For shift instructions, this flag is set to the final val- 
ue shifted out; for a zero shift count, this is set to zero. 
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Table 11-8 lists the condition mnemonic, code, description, and flag for 
each of the 1 9 condition codes. 

Table 1 1-8. Condition Codes and Flags 



Condition 


Code 


Description 


Flagf 


Unconditional Compares 


U 


00000 


Unconditional 


Don't care 


Unsigned Compares 


LO 


00001 


Lower than 


C 


LS 


00010 


Lower than or same as 


CORZ 


HI 


00011 


Higher than 


~C AND ~Z 


HS 


00100 


Higher than or same as 


~C 


EQ 


00101 


Equal to 


Z 


NE 


00110 


Not Equal to 


~Z 


Signed Compares 


L 


00111 


Less than 


N 


LE 


01000 


Less than or equal to 


NORZ 


GT 


01001 


Greater than 


~N AND ~Z 


GE 


01010 


Greater than or equal to 


~N 


EQ 


00101 


Equal to 


Z 


NE 


00110 


Not equal to 


~Z 


Compare to Zero 


Z 


00101 


Zero 


z 


NZ 


00110 


Not zero 


~z 


P 


01001 


Positive 


~N AND ~Z 


N 


00111 


Negative 


N 


NN 


01010 


Nonnegative 


~N 


Compare to Condition Flags 


NN 


01010 


Nonnegative 


~N 


N 


00111 


Negative 


N 


NZ 


00110 


Nonzero 


~Z 


Z 


00101 


Zero 


Z 


NV 


01100 


No overflow 


~v 


V 


01101 


Overflow 


V 


NUF 


01110 


No underflow 


~UF 


UF 


01111 


Underflow 


UF 


NC 


00100 


No carry 


~C 


C 


00001 


Carry 


C 


NLV 


10000 


No latched overflow 


~LV 


LV 


10001 


Latched overflow 


LV 


NLUF 


10010 


No latched floating-point underflow 


-LUF 


LUF 


10011 


Latched floating-point underflow 


LUF 


ZUF 


10100 


Zero or floating-point underflow 


ZOR UF 



t The ~ means logical complement ("not true" condition). 
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11.3 Individual Instructions 

This section contains the individual assembly language instructions for the 
TMS320C40. The instructions are listed in alphabetical order. Information 
for each instruction includes assembler syntax, operation, operands, en- 
coding, description, cycles, status bits, mode bit, and examples. 

Definitions of the symbols and abbreviations, as well as optional syntax 
forms allowed by the assembler, precede the individual instruction descrip- 
tion section. Also, an example instruction shows the special format used 
and explains its content. 

A functional grouping of the instructions, as well as a complete instruction 
set summary, can be found in Section 11.1. Appendix B lists the opcodes 
for all the instructions. Refer to Chapter 6 for information on memory ad- 
dressing. Code examples using many of the instructions are given in Chap- 
ter NO TAG, Software Applications. 

11.3.1 Symbols and Abbreviations 

Table 11-9 lists the symbols and abbreviations used in the individual in- 
struction descriptions. 
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Individual Instructions 

Table 11-9. Instruction Symbols 





ivitscuiiiiy 


src 


Source operand 


srd 


Source operand 1 


src2 


Source operand 2 


src3 


Source operand 3 


Of U*r 


OUUIOc; UpclctllU *r 


dst 


Destination operand 


dst1 


Destination operand 1 


dst2 


Destination operand 2 


disp 


Disolaeement 


cond 


Condition 


count 


Shift count 


G 


General addressing modes 


T 


Three-operand addressing modes 


P 


Parallel addressing modes 


b 


Conditional-branch addressing modes 


ARn 


Auxiliary register n 


IRn 


Index register n 


Rn 


Register address n 


RC 


Repeat count register 


RE 


Repeat end address register 


DO 

no 


nepeai sian aaaress register 


ST 


Status register 


C 


Carry bit 




uioDai interrupt enable bit 


N 


Trap vector 


PC 


Program counter 


RM 


Repeat mode flag 


SP 


System stack pointer 


|x| 


Absolute value of x 


x —> y 


Assign the value of x to destination y 


x(man) 


Mantissa field (sign + fraction) of x 


x(exp) 


Exponent field of x 


op1 




||o P 2 


Operation 1 performed in parallel with operation 2 


x AND y 


Bitwise logical- AND of x and y 


xORy 


Bitwise logical-OR of x and y 


xXORy 


Bitwise logical-XOR of x and y 


-x 


Bitwise logical-complement of x 


x«y 


Shift x to the left y bits 


x » y 


Shift x to the right y bits 


*++SP 


Increment SP and use incremented SP as address 


*SP — 


Use SP as address and decrement SP 
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11.3.2 Optional Assembler Syntaxes 

The assembler allows a relaxed syntax form for some instructions. These 
optional forms simplify the assembly language so that special-case syntax 
can be ignored. The following is a list of these optional syntax forms. 

□ The destination register can be omitted on unary arithmetic and log- 
ical operations when the same register is used as a source. For exam- 
ple, 

absi ro,ro can be written as absi ro 

Instructions affected: ABSI, ABSF, FIX, FLOAT, NEGB, NEGF, NEGI, 
NORM, NOT, RND. 

□ All 3-operand instructions can be written without the 3. For example, 

add 1 3 ro,ri,r2 can be written as addi ro,ri,r2 

Instructions affected: ADDC3, ADDF3, ADDI3, AND3, ANDN3, ASH3, 
LSH3, MPYF3, MPYI3, OR3, SUBB3, SUBF3, SUBI3, XOR3, 
MPYSHI3, MPYUHI3. 

This also applies to all the pertinent parallel instructions. 

□ All 3-operand comparison instructions can be written without the 3. For 
example, 

cmp 1 3 ro,*aro can be written as cmpi ro,*aro 

Instructions affected: CMPI3, CMPF3, TSTB3. 

□ Indirect operands with an explicit 0 displacement are allowed. In 3-oper- 
and or parallel instructions, operands with 0 displacement are automat- 
ically converted to no-displacement mode. For example: 

ldi * +aro ( 0 ) , ri is legal 

Also 

add i 3 * +aro ( o ) , ri , R2 is equivalent to add 1 3 * aro , ri , R2 

□ Indirect operands can be written with no displacement; in which case, 
a displacement of one is assumed. For example, 

ldi *aro++(1),ro canbewritten ldi *aro++,ro 

□ All conditional instructions accept the suffix U to indicate unconditional 
operation. Also, the U can be omitted from unconditional short branch 
instructions. For example: 

bu label can be written b label i 

□ Labels can be written with or without a trailing colon. For example: | 

labelO: NOP 
labell NOP 

iabei2 : (label assembles to next source line) 
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□ Empty expressions are not allowed for the displacement in indirect 
mode: 

ld i * + AR0 ( ) , RO is not legal 

□ Immediate-mode destination operands of BR and CALL can be writ- 
ten with an at sign (@): 

Br label can be written br @ label 

□ The LDP pseudo-op can be used to load a register (DP by default) with 
the 16 MSBs of a relocatable address as follows: 

LDP addr,REG or LDP @addr,REG 

The at sign (<§>) is optional. 

If the destination REG is the DP, LDP generates an LDPK instruction. 
Otherwise it generates an LDIU instruction. In both cases an immediate 
operand with a special relocation type is used. 

□ Parallel instructions can be written in either order. For example: 
add i can be written as sti 

I | STI I | ADD I 

□ The parallel bars indicating part 2 of a parallel instruction can be written 
anywhere on the line from column 0 to the mnemonic. For example: 

add i can be written as add i 

I | STI I | STI 

□ If the second operand of a parallel instruction is the same as the third 
(destination register) operand, the third operand can be omitted. This 
allows the writing of 3-operand parallel instructions that look like normal 
2-operand instructions. For example, 

add i *aro,r2,r2 can be written as addi *aro,r2 

I | MPYI *AR1,R0 / R0 I | MPYI *AR1 , RO 

Instructions affected (applies to all parallel instructions that have a reg- 
ister as the second operand): ADDI, ADDF, AND, MPYI, MPYF, OR, 
SUBI, SUBF, XOR. 

□ All commutative operations in parallel instructions can be written in ei- 
ther order. For example, the ADDI part of a parallel instruction can be 
written in either of two ways: 

ADDI *AR0,R1,R2 Or ADDI R1,*AR0,R2 

The instructions affected are parallel instructions containing any of the 
following: ADDI, ADDF, MPYI, MPYF, AND, OR, XOR. 

□ Use the syntax in Table 1 1 -1 0 to designate CPU registers in operands. 
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11.3.3 Individual Instruction Descriptions 

Each assembly language instruction for the TMS320C40 is described in 
this section in alphabetical order. The description includes the assembler 
syntax, operation, operands, encoding, description, cycles, status bits, 
mode bit, and examples. 

Table 11-10. CPU Register Syntax 



Assembler 
Syntax 


Register 
Machine 
Value (hex) 


Assigned Function Name 


Explained 0n 
Paragraph Page 


RO 

R1 

R2 

R3 

R4 

R5 

R6 

R7 

R8 

R9 

R10 

R11 


00 
01 
02 
03 
04 
05 
06 
07 
1C 
1D 
1E 
1F 


Extended-precision register 0 
Extended-precision register 1 
Extended-precision register 2 
Extended-precision register 3 
Extended-precision register 4 
Extended-precision register 5 
Extended-precision register 6 
Extended-precision register 7 
Extended-precision register 8 
Extended-precision register 9 
Extended-precision register 10 
Extended-precision register 11 


3.1.1 3-4 
3.1.1 3-4 
3.1.1 3-4 
3.1.1 3-4 
3.1.1 3-4 
3.1.1 3-4 
3.1.1 3-4 
3.1 .1 3-4 
3.1.1 3-4 
3.1.1 3-4 
3.1.1 3-4 
3.1.1 3-4 


ARO 
AR1 
AR2 
AR3 
AR4 

AR6 
AR7 


08 
09 
OA 
0B 
OC 
nn 

0E 
OF 


Auxiliary register 0 
Auxiliary register 1 
Auxiliary register 2 
Auxiliary register 3 
Auxiliary register 4 

Ai iviliorv/ ro/^iiotor R 
MUAlllcliy icyiolt?! *J 

Auxiliary register 6 
Auxiliary register 7 


3.1.2 3-5 
3.1.2 3-5 
3.1.2 3-5 
3.1.2 3-5 
3.1.2 3-5 

O. 1 »c o o 

3.1.2 3-5 
3.1.2 3-5 


DP 
IRO 
IR1 
BK 
SP 


10 
11 
12 
13 
14 


Data-page pointer 
Index register 0 
Index register 1 
Block-size register 
System stack pointer 


3.1.3 3-5 

3.1.4 3-5 

3.1.4 3-5 

3.1.5 3-5 

3.1.6 3-5 


ST 
DIE 
HE 
IIF 


15 
16 
17 
18 


Status register 

DMA Coprocessor interrupt enable 
Internal-interrupt enable register 
HOF pins and interrupt flag register 


3.1.7 3-5 

3.1.8 3-8 

3.1.9 3-10 

3.1.10 3-12 


RS 
RE 
RC 


19 
1A 
1B 


Repeat start address 
Repeat end address 
Repeat counter 


3.1.11 3-14 
3.1.11 3-14 
3.1.11 3-14 



11-17 



EXAMPLE Example Instruction 



Syntax INST src, dst 
or 

INST1 src2,dst1 
|| INST2 src3, dst2 

Each instruction begins with an assembler syntax expression. Labels may 
be placed either before the command (instruction mnemonic) on the same 
line or on the preceding line in the first column. The optional comment field 
that concludes the syntax is not included in the syntax expression. 
Space(s) are required between each field (label, command, operand, and 
comment fields). 

The syntax examples illustrate the common one-line syntax and the two-line 
syntax used in parallel addressing. Note that the two vertical bars 1 1 that indi- 
cate a parallel addressing pair can be placed anywhere before the mnemon- 
ic on the second line. The first instruction in the pair can have a label, but 
the second instruction cannot have a label. 

Operation \ S rc | dst 
or 

\src2 1 -> dst1 
|| src3-*dst2 

The instruction operation sequence describes the processing that takes 
place when the instruction is executed. For parallel instructions, the opera- 
tion sequence is performed in parallel. Conditional effects of status register 
specified modes are listed for conditional instructions such as Bcond. 

Operands src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 

or 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst 1 register (R0 — R7) 

src3 register (R0 — R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 

Operands are defined according to the addressing mode and/or the type 
of addressing used. Note that indirect addressing uses displacements and 
the index registers. Refer to Chapter 5 for detailed information on address- 
ing. 
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Example Instruction EXAMPLE 



Encoding 

31 24 23 16 15 87 0 



1 

0 0 


0 


i i i i i 

INST 


I 

G 


I - 1 1 1 

dst 


1 1 


I 1 1 1 


1 1 1 1 

src 


i" i r"i r 


31 






or 


24 23 








16 15 




87 


0 


1 

1 1 


1 1 

INST1 


|lNST2 


l l 

dst1 


— I 
0 


1 

0 0 


src3 


l l l l l l l 

dst2 


I 1 1 I I I 1 

src2 



Encoding examples are shown for general addressing and parallel address- 
ing. The instruction pair for the parallel addressing example consists of 
INST1 and INST2. 

Description Instruction execution and its effect on the rest of the processor or memory 
contents are described. Any constraints on the operands imposed by the 
processor or the assembler are discussed. The description parallels and 
supplements the information given by the operation block. 

Cycles 1 

The digit specifies the number of cycles required to execute the instruction. 

Status Bits LUF Latched Floating-Point Underflow Condition Flag. 1 if a float- 
ing-point underflow occurs, unchanged otherwise. 

LV Latched Overflow Condition Flag- 1 if an integer or floating-point 
overflow occurs, unchanged otherwise. 

UF Floating-Point Underflow Condition Flag. 1 if a floating-point un- 
derflow occurs, 0 otherwise. 

N Negative Condition Flag. 1 if a negative result is generated, 0 
otherwise. In some instructions, this flag is the MSB of the output. 

Z Zero Condition Flag. 1 if a zero result is generated, 0 otherwise. For 
logical and shift instructions, 1 if a zero output is generated, 0 other- 
wise. 

V Overflow Condition Flag. 1 if an integer or floating-point overflow 
occurs, 0 otherwise. 

C Carry Flag. 1 if a carry or borrow occurs, 0 otherwise. For shift in- 
structions, this flag is set to the value of the last bit shifted out; 0 for 
a shift count of 0. 

The seven condition flags are stored in the status register (ST). They pro- 
vide information about the properties of the result or output of arithmetic or 
logical operations. 
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Mode Bit OVM Overflow Mode Flag. In general, integer operations are affected 
by the OVM bit value (described in Table 3-2 on page 3-6). 

Example INST @98AEh,R5 

Before Instruction: 

DP = 80h 

R5 = 07 6690 OOOOh . 2.30562500e+02 

Memory at 0080 98AEh = 5CDFh = 1 .00001 107e + 00 

LUF LV UF N Z V C = 0 0 0 0 0 00 

After Instruction; 

DP = 80h 

R5 = 00 6690 OOOOh = 1 .801 26953e + 00 
Memory at 80 98AEh = 5CDFh - 1. 00001 107e + 00 
LUF LV UF N Z V C = 0 0 0 0 0 0 0 

The sample code presented in the above format shows the effect of the code 
on system pointers (e.g., DP or SP), registers (e.g., R1 or R5), memory at 
specific locations, and the seven status bits. The values given for the regis- 
ters include the leading zeros to show the exponent in floating-point opera- 
tions. Decimal conversions are provided for all register and memory loca- 
tions. The seven status bits are listed in the order in which they appear in 
the assembler and simulator (see Section 11.2 on page 11-10 and 
Table 11-8 on page 11-12 for further information on these seven status 
bits). 
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Absolute Value of Floating-Point Number ABSF 



Syntax 

Operation 

Operands 



ABSF src, dst 
|src| -» dst 

src general addressing modes (G) : 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (R0-R11) 



Encoding 



31 


24 23 




16 15 


87 




0 


T 1 — 




r 


i i i i 


i i i 


i -I r~i "i T" r 


I 1 1 


i i 


0 0 0 


0 0 0 0 0 0 


G 


dst 




src 







Description The absolute value of the src operand is loaded into the dst register. The 
src and dst operands are assumed to be floating-point numbers. 

An overflow occurs if src (man) = 8000 OOOOh and src(exp) = 7Fh. The result 
is dst (man) = 7FFF FFFFh and dst (exp) = 7Fh. 

Cycles 1 

Status Bits LUF Unaffected. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 0. 

N 0. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example absf R4,R7 

Before Instruction: 

R4 = 05C8000 F971 h = -9.90337307e + 27 
R7 = 07D2511 OOAEh = 5.48527255e + 37 
LUF LV UF N Z V C = 0 0 0 0 0 

After Instruction: 



0 0 



R4 = 05C8000 F971 h = -9.90337307e + 27 
R7 = 05C7FFF 068Fh = 9.90337307e + 27 
LUF LV UF N Z V C = 0 0 0 0 0 00 
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Syntax ABSF src2, dsti 

|| STF src3, dst2 

Operation \src2\->dst1 
|| src3-*dst2 

Operands src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dsti register (RO -R7) 

src3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



Encoding 

31 



24 23 



16 15 



8 7 



— T — 
1 1 


— I — 1 — 1 — 1 — 

0 0 10 0 


1 1 ■ 

dsti 


i i 

0 0 0 


1 1 

$rc3 


i i i i i i i 

dst2 


— 1 — 1 — 1 1 1 1 1 

src2 



Description A floating-point absolute value and a floating-point store are performed in 
parallel. All registers are read at the beginning and loaded at the end of the 
execute cycle. This means that if one of the parallel operations (STF) reads 
from a register and the operation being performed in parallel (ABSF) writes 
to the same register, then STF accepts as input the contents of the register 
before it is modified by the ABSF. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. If src3 and dsti point to the same register, src3 is read before the write 
to dsti. 

An overflow occurs if src (man) = 80000000h and src(exp) = 7Fh. The result 
is dst (man) = 7FFFFFFFh and dst (exp) = 7Fh. 

Cycles 1 

Status Bits LUF Unaffected. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 0. 

N 0. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Parallel ABSF and STF ABSF||STF 



Example absf *++AR3(iRl) ,R4 

II STF R4,*- AR7 (1) 

Before Instruction: 

AR3 = 80 9800h 
IR1 = OAFh 

R4 = 733C0 OOOOh = 1 .79750e + 02 
AR7 = 80 98C5h 

Data at 80 98AFh = 58B 4000h = - 6.118750e + 01 
Data at 80 98C4h = Oh 

LUFLVUFNZVC = 0 0 0 0 0 00 

After Instruction: 

AR3 - 80 98AFh 
IR1 = OAFh 

R4 = 574C0 OOOOh = 6.118750e + 01 
AR7 = 80 98C5h 

Data at 80 98AFh = 58B 4000h = -6.118750e + 01 
Data at 80 98C4h = 733 COOOh = 1 .79750e + 02 
LUFLV UFNZV C = 0 0 0 0 0 0 0 
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ABSI Absolute Value of Integer 



Syntax 

Operation 

Operands 



ABSI src, dst 
|src| -» dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 


87 


0 


"T — I — 

0 0 0 


— I — I — 1 — 1 — 1 — 

0 0 0 0 0 1 


— 1 — 

G 


l l l l 

dst 


i i i i i i i i i i i i i i i 

src 



Description The absolute value of the src operand is loaded into the dst register. The 
src and dst operands are assumed to be signed integers. 

An overflow occurs if src = 8000 OOOOh. If ST(OVM) = 1, the result is 
dst = 7FFF FFFFh. If ST(OVM) = 0, the result is dst = 8000 OOOOh. 



Cycles 



1 



Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 
N 0. 

Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C Unaffected. 



Mode Bit OVM Operation is affected by OVM bit value. 
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Absolute Value of Integer ABSI 



Example 1 absi ro,ro 

Or ABSI RO 

Before Instruction: 

RO = OFFFF FFCBh = - 53 
After Instruction: 

RO = 035h = 53 
Example 2 abs I * ari , R3 

Before Instruction; 

AR1 = 20h 
R3 = 0h 

Data at 20h = OFFFF FFCBh = - 53 

After Instruction; 

AR1 = 20h 
R3 = 35h = 53 

Data at 20h . OFFFF FFCBh = - 53 
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ABSI||STI Parallel ABSI and STI 



Syntax 



Operation 



Operands 



Encoding 

31 



ABSI src2,dst1 
STI src3, dst2 

\src2 1 -> dst1 
src3 -> dst2 



src2 
dst1 
$rc3 
dst2 



indirect (disp = 0, 1, IRO, IR1) 
register (RO - R7) 
register (RO - R7) 
indirect (disp = 0, 1, IRO, IR1) 



24 23 



16 15 



87 



1 


I I I "I — 


1 1 


i i 




III 1 I I 1 


T" 1 1 1 1 1 1 1 


1 1 


0 0 10 1 


dst1 


0 0 0 


src3 


dst2 


src2 



Description An integer absolute value and an integer store are performed in parallel. All 
registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STI) reads from a 
register and the operation being performed in parallel (ABSI) writes to the 
same register, then STI accepts as input the contents of the register before 
it is modified by the ABSI. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

An overflow occurs if src = 8000 OOOOh. If ST(OVM) = 1 , the result is dst = 
7FFF FFFFh. If ST(OVM) = 0, the result is dst = 8000 OOOOh. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 0. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is affected by OVM bit value. 
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Parallel ABSI and STI ABSI||STI 



Example abs i * -AR5 ( l ) , R5 

|| STIR1,*AR2 (IR1) 

Before Instruction: 

AR5 = 80 99E2h 
R5 = 0h 
R1 = 42h = 66 
AR2 = 80 98FFh 
IR1 = OFh 

Data at 80 99E1 h = OFFFF FFCBh = - 53 
Data at 80 98FFh = 2h = 2 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 

AR5 = 80 99E2h 

R5 = 35h = 53 

R1 = 42h = 66 
AR2 = 80 98F0h 
IR1 = OFh 

Data at 80 99E1 h = OFFFF FFCBh = - 53 

Data at 80 98FFh = 42h = 66 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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ADDC Add Integer With Carry 



Syntax 

Operation 

Operands 



ADDC src, dst 
dst + src + C -> dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 




24 23 




16 15 


87 


0 


1 — I - 

0 0 0 


i r 

0 0 


1 "1 1 — 

0 1 0 


G 


1 1 1 1 

dst 


I I I I 


""'J I""! 11 1 "1 1 11 1 1 
src 


1 1" 1 1 1 



Description The sum of the dst and src operands and the C (carry) flag is loaded into 
the dst register. The dst and src operands are assumed to be signed inte- 
gers. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a carry occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 

Example addc ri , R5 

Before Instruction: 



0 0 0 0 0 



R1 = 00FFFF 5C25h = - 41 ,947 
R5 = 00FFFF 019Eh = - 65,122 
LUFLV UF N Z V C = 0 0 

After Instruction: 

R1 = 00FFFF 5C25h = -41 ,947 

R5 = 00FFFE 5DC4h = - 1 07,068 

LUFLV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Add Integer With Carry, 3 Operands ADDC3 



Syntax ADDC3 src2, srrt, dst 
Operation S rd + src2 + C -> ofef 

Operands srd, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 
Typel 



31 








24 23 




16 15 




87 




0 


0 0 1 


I — I — I 
0 0 


I — I 

0 


I — I 
0 


1 

0 0 


1 

T 


1 1 1 I 

dst 


1 1 1 I I I I 

srd 


i i i i I I I 

src2 


Type 2 

31 








24 23 




16 15 




87 




0 


— I — |—T 

0 0 1 


1 0 


0 


I 

0 


i 

0 0 


i 

T 


i i i i 

dst 


i i i i i i i 

srd 




\ 1 1 1 

src2 


\ r 



Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1 , IRO, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1, IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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ADDC3 Add Integer With Carry, 3 Operands 



Description The sum of the srd and src2 operands and value of the C (carry) flag is 
loaded into the dst register. The srd, src2, and dst operands are assumed 
to be signed integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0, the condition flags are modified if the destination reg- 
ister is R0 - R1 1 . If ST (SET COND) = 1 , they are modified for all destination 
registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
U 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a carry occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 
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Assembly Language Instructions 



Add Floating-Point Values ADDF 



Syntax 

Operation 

Operands 



ADDF src, dst 
dst+ src-> dst 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (R0-R11) 



Encoding 



31 


24 23 




16 15 


87 


0 


i i 

0 0 0 


— I — 1 — 1 1 1 

0 0 0 0 1 1 


" I 1 

G 


i i i i 

dst 


i i i i i i i i i i i i i i i 

src 



Description The sum of the cfefand srcoperands is loaded into the dst register. The dst 
and srcoperands are assumed to be floating-point numbers. 



Cycles 
Status Bits 



1 

LUF 1 



LV 

UF 

N 

Z 

V 

c 



if a floating-point underflow occurs, unchanged otherwise. 

if a floating-point overflow occurs, unchanged otherwise. 

if a floating-point underflow occurs, 0 otherwise. 

if a negative result is generated, 0 otherwise. 

f a zero result is generated, 0 otherwise. 

f an floating-point overflow occurs, 0 otherwise. 



Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example addf * AR4 + + ( i ri ) , R5 

Before Instruction: 

AR4 = 80 9800h 
IR1 = 12Bh 

R5 = 057980 OOOOh = 6.23750e+01 

Data at 80 9800h = 86B 2800h = 4.7031 250e + 02 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR4 = 80 992Bh 
IR1 =12Bh 

R5 = 09052C OOOOh = 5.3268750e+02 

Data at 80 9800h = 86B 2800h = 4.7031 250e + 02 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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ADDF3 Add Floating-Point Values, 3 Operands 



Syntax ADDF3 src2, srrf, dst 
Operation src i + src2 -» dst 

Operands srci, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (RO - R1 1 ) 



Encoding 
Typel 

31 24 23 16 15 87 0. 



1 1 1 I I 1 1 1 




1 1 1 1 


I I I l I I I 


l i I I I I I 


0 0 1 0 0 0 0 0 1 


T 


dst 


srd 


src2 



Type 2 

31 



24 23 



16 15 



87 



i th — r— i — i — i — r— 




— i — r i i 


— i — i — i" i — i — i— r 


— i — i — t — i r-r^j 


0 0 1 1 0 0 0 0 1 


T 


dst 


srd 


src2 



Instruction Word Fields 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (R0 - R11) 


register mode (RO - R11) 


01 


indirect mode (disp = 0, 1 , IR0, IR1) 


register mode (RO - R11) 


10 


register mode (RO - R11) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1, IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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Assembly Language Instructions 



Add Floating-Point Values, 3 Operands ADDF3 



Description The sum of the srd and src2 operands is loaded into the dst register. The 
srd, src2, and dst operands are assumed to be floating-point numbers. 



Cycles 



Mode Bit 
Example 



1 



Status Bits LUF 1 



LV 

UF 

N 

Z 

V 

c 



if a floating-point underflow occurs, unchanged otherwise, 
if a floating-point overflow occurs, unchanged otherwise, 
if a floating-point underflow occurs, 0 otherwise, 
f a negative result is generated, 0 otherwise, 
f a zero result is generated, 0 otherwise, 
f an floating-point overflow occurs, 0 otherwise. 



1 
1 
1 
1 
1 

Unaffected. 



OVM Operation is not affected by OVM bit value. 

ADDF3 *AR1 (2) , *+ARl (8) ,R4 

Before Instruction: 

AR1 = 2FF820h 
R4 = 0h 

Data at 22F F822h = 700 FOOOh = 1 .28940e + 02 
Data at 22F F828h - 34C 2000h = 1 .27590e + 01 
LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR1 = 2F F820h 

R4 = 070DB2 OOOOh = 1 .41 69531 3 e + 02 
Data at 22F F828h = 34C 2000h . 1 .27590e + 01 
LUFLV UF N Z V C = 0 0 0 0 0 00 



11 
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ADDF3||STF Parallel ADDF3 and STF 



Syntax 

Operation 

Operands 



ADDF3 src2, srrf, dst1 
|| STF src3, dst2 

srrf + src2-> dst1 
|| src3->dst2 

srrf register (RO - R7) 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst1 register (RO - R7) 

src3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



Encoding 



31 






24 23 




16 15 




87 




0 


1 1 


1 "TH 
0 0 1 


i 

1 0 


1 " 1 "" 

dst1 


f—'l 

srrf 


src3 


I I 


1 '"- 1 1 1 
dst2 




I l l l l l I 

src2 



Description A floating-point addition and a floating-point store are performed in parallel. 

All registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STF) reads from a 
register and the operation being performed in parallel (ADDF3) writes to the 
same register, then STF accepts as input the contents of the register before 
it is modified by the ADDF3. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an floating-point overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Parallel ADDF3 and STF ADDF3| |STF 



Example addf 3 * +AR3 ( i ri ) , R2 , R5 

|| STF R4,*AR2 

Before Instruction: 

AR3 = 809800h 
IR1 = 0A5h 

R2 = 070C80 OOOOh = 1 .4050e + 02 
R5 = 0h 

R4 = 057B40 OOOOh = 6.281 250e + 01 
AR2 = 80 98F3h 

Data at 80 98A5h = 733 COOOh = 1 .79750e + 02 
Data at 80 98F3h = Oh 

LUF LVUFNZVC = 0 0 0 0 0 00 
After Instruction: 

AR3 = 80 9800h 
IR1 = 0A5h 

R2 = 070C80 OOOOh = 1 .4050e+02 
R5 - 082020 OOOOh = 3.20250e + 02 
R4 = 057B40 OOOOh = 6.281 250e + 01 
AR2 = 80 98F3h 

Data at 80 98A5h = 733 COOOh = 1 .79750e + 02 
Data at 80 98F3h = 57B 4000h = 6.281 25e + 01 
LUFLV UF N Z V C = 0 0 0 0 0 0 0 
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APDI Add Integer ^ 

Syntax ADDI src, ds t 
Operation dst + src -» dst 

Operands src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 


87 




0 


—i — i— 


r-r-T — i — i — 


— 1 — 


1 1 1 1 


i i i i 


~i — i — i — r-i — r 


-l — i — r 


n — r— 


0 0 0 


0 0 0 1 0 0 


G 


dst 




src 







Description The sum of the dst and src operands is loaded into the the dst register. The 
dst and src operands are assumed to be signed integers. 

Status Bits If ST (SET COND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a carry occurs, 0 otherwise. 

OVM Operation is affected by OVM bit value. 

ADDI R3,R7 

Bqfpre Instruction: 

R3 = 0FFFF FFCBh = - 53 
R7 = 35h = 53 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After instruction: 

R3 = 0FFFF FFCBh = -53 
R7 = 0h 

LUFLVUFNZVC^O 0 0 0 0 0 0 



Mode Bit 
Example 
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Assembly Language Instructions 



Add Integer, 3 Operands ADDI3 



Syntax ADDI3 src2, $rd , dst 
Operation S rd + src2 -> dst 

Operands srrt, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 



Typel 



"""I I""""T THT I ' I "I 
0 0 1 0 0 0 0 1 0 


T 


—I — 1 — 1 — I" 

dst 


T— 1 — 1 — 1 1 1 1 

srd 


— i — i — i — i — r i i - 

src2 


Type 2 

31 24 23 16 15 8 7 0 


III l l l l l 

0 0 1 1 0 0 0 1 0 


l 

T 


i i i i 

dst 


i i i i i i i 

srd 


i i i i i i i 

src2 



Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1 , IRO, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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ADDI3 Add Integer, 3 Operands 

Description The sum of the srd and $rc2 operands is loaded into the dst register.The 
srrf, src2, and dst operands are assumed to be signed integers. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET GOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a carry occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 
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Assembly Language Instructions 



Parallel ADDI3 and STI ADDI3||STI 



Syntax 

Operation 

Operands 



ADDI3 src2, srcl, dst1 
STI src3, dst2 



srd + src2 - 
I src3 -» dst2 



dst1 



srd register (RO - R7) 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst1 register (RO - R7) 

src3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



Encoding 



24 23 



16 15 



87 



— T" 


i ii ■ 


r i" 


I I 


l — 1 


i — r 


~"i — i — i — i — i — 


i i r it r T" 


0 1 


0 0 111 


dst1 


srd 


src3 




dst2 


src2 



Description An integer addition and an integer store are performed in parallel. All regis- 
ters are read at the beginning and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (ADDI3) writes to the same 
register, then STI accepts as input the contents of the register before it is 
modified by the ADDI3. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

Cycles 1 

Status Bits LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C 1 if a carry occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 



11 
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ADDI3||STI Parallel ADDI3 and STI 



Example ADDI3 *ARO (IRO) , R5, RO 

II STI R3,*AR7 

Before Instruction: 

ARO = 80 992Ch 

IRO = OCh 

R5 = ODCh = 220 

R0 = 0h 

R3 = 35h = 53 

AR7 = 80 983Bh 

Data at 80 992Ch = 1 2Ch = 300 

Data at 80 983Bh = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

ARO = 80 9920h 

IRO . OCh 

R5 = ODCh = 220 

RO = 208h = 520 

R3 = 35h = 53 

AR7 . 80 983Bh 

Data at 80 992Ch = 1 2Ch = 300 

Data at 80 983Bh = 35h = 53 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Bitwise Logical-AND AND 



Syntax 

Operands 

Operands 



AND src, dst 
dst AND src- 



dst 



src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 

1 1 immediate (not sign-extended) 
dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 






16 15 


87 


0 


T" I""""" 

0 0 0 


" 1 1 1" "1 1 
0 0 0 1 0 1 


1 

G 


I I I I 

dst 


i i i i i I i l i i i I I I I 

src 



Description The bitwise logical-AND between the dst and src operands is loaded into 
the dst register. The cfef and srcoperands are assumed to be unsigned inte- 
gers. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 

V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example and ri,r2 

Before Instruction: 

R1 = 80h 
R2 = OAFFh 
LUF LV UF N Z V 

After Instruction: 



C=0 0 0 0 0 0 1 



R1 = 80h 
R2 = 80h 

LUF LV UF N Z V C = 0 



0 0 0 0 0 1 
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AND3 Bitwise Logical-AND, 3 Operands 



Syntax AND3 src2, srrf, dst 
Operation S rd & src2 -» dst 

Operands srd, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 
Type 1 

31 24 23 16 15 87 0 



i "T i — r-T — i — i — i 

0 0 1 0 0 0 0 1 1 


1 

T 


1 iii 

dst 


i i i i i i i 

srd 


i r i i i i i 

src2 


Type 2 

31 24 23 16 15 8 7 0 


III I I I I I 

0 0 1 1 0 0 0 1 1 


i 

T 


i i i i 

dst 


i i i i i i i 

srd 


i i i i i i i 

src2 



Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1, IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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Assembly Language Instructions 



Bitwise Logical-AND with Complement AND3 



Description The bitwise logical-AND between the srd and src2 operands is loaded into 
the dsf register. The srrf, src2, and dst operands are assumed to be un- 
signed integers. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 



11 



11-43 



AND3||STI Parallel AND3 and STI 
Syntax 
Operation 
Operands 



AND $rc2, srrt, dst1 
I STI src3, dst2 



srd AND src2- 
$rc3 dst2 



dst1 



srd register (RO - R7) 

src2 indirect (disp = 0, 1, IRO, IR1) 

dst1 register (RO - R7) 

src3 register (RO - R7) 

d$t2 indirect (disp = 0,1, IRO, IR1 ) 



Encoding 

31 



24 23 



1615 



87 



1 


I 1 1 I 


' 1 ""1 " 


i i 


i i 


1 1 1 I \ \ 1 


1 1 1 1 i i i 


1 1 


0 10 0 0 


dst1 


srd 


src3 


dst2 


$rc2 



Description A bitwise logical-AND and an integer store are performed in parallel. All reg- 
isters are read at the beginning and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (AND3) writes to the same 
register, then STI accepts as input the contents of the register before it is 
modified by the AND3. 



Cycles 



If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

1 



Status Bits LUF Unaffected. 

LV Unaffected. 
UF 
N 
Z 
V 

c 



0. 

MSB of the output. 

1 if a zero result is generated, 0 otherwise. 
0. 

Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Parallel AND3 and STI AND3| |STI 



Example AND 3 * +AR1 (IRO) ,R4,R7 

|| STI R3,*AR2 

Before Instruction: 

AR1 = 8099F1 h 

IRO = 8h 

R4 - 0A323h 

R7 = 0h 

R3 = 35h = 53 

AR2 . 80 983Fh 

Data at 80 99F9h = 5C53h 

Data at 80 983Fh - Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After instruction; 

AR1 =80 99F1h 

R0 = 8h 

R4 = 0A323h 

R7 = 03h 

R3 = 35h = 53 

AR2 = 80 983Fh 

Data at 80 99F9h = 5C53h 

Data at 80 983Fh = 35h = 53 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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ANDN Bitwise Logical-AND With Complement 



Syntax 

Operation 

Operands 



ANDN src, dst 

dst AND -src -» dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 

1 1 immediate (not sign-extended) 
dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 






16 15 


87 


0 


i i 

0 0 0 


i i i i i 

0 0 0 1 1 0 


■"! 

G 


dst 


l l l l l I l I l l I I I I I 

src 



Description The bitwise logical-AND between the dst operand and the bitwise logical 
complement (~) of the src operand is loaded into the dst register. The dst 
and src operands are assumed to be unsigned integers. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 

V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example ANDN @980Ch,R2 

Before Instruction: 

DP = 80h 
R2 = 0C2Fh 

Data at 80 980Ch = 0A02h 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

DP = 80h 
R2 = 042Dh 

Data at 80 980Ch = 0A02h 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Bitwise Logical-ANDN, 3 Operands ANDN3 



Syntax ANDN3 src2, srd, dst 
Operation S rd AND ~ src2 -» dst 

Operands srd, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 



Encoding 
Typel 



— 1 — 1 — 1 

0 0 1 


' \ I 
0 0 


I — I 

0 


i — i — r 

1 0 0 


T 

T 


—1 — 1 — 1 — 1 

dst 


— r ]■" 


srd 


~1 — 


src2 


Type 2 

31 






24 23 


16 15 




87 




0 


1™ I— i 

0 0 1 


■ 1 "1 

1 0 


I — I 

0 


1 l 

1 0 0 


T 


I I I I 

dst 


i i i i i i i 

srd 


i i i i i i i 

src2 



Instruction Word Fields 



type 1 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp - 0, 1 , IRQ, IR1 ) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0,1, IRO, IR1) 


indirect mode (disp = 0,1, IRO, IR1 ) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn (5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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ANDN3 Bitwise Logical- ANDN, 3 Operands 

Description The bitwise logical-AND between the srd operand and the bitwise logical 
complement (~) of the src2operand is loaded into the dst register. The srd, 
src2, and dst operands are assumed to be unsigned integers. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 
UF 0. 

N MSB of the output. 

2 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Arithmetic Shift ASH 



Syntax 
Operation 



Operands 



ASH count, dst 

If (count £0): 

dst« count-* dst 
Else: 

dst» \count\ dst 

count general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 

1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 


87 


0 


— \~1 

0 0 0 


— 1 — 1 — 1 — 1 — 1 — 

0 0 0 1 1 1 


1 

G 


i i i i 

dst 


i i i i 


1 — 1 IT | — f — | — | 

count 


i — i — i — r- 



Description The seven least-significant bits of the count operand are used to generate 
the twos-complement shift count of up to 32 bits. 

If the count operand is greater than zero, the dst operand is left-shifted by 
the value of the count operand. Low-order bits shifted in are zero-filled, and 
high-order bits are shifted out through the C (carry) bit. 

Arithmetic left-shift: 



dst 



If the count operand is less than zero, the dst operand is right-shifted by the 
absolute value of the count operand. The high-order bits of the dst operand 
are sign-extended as it is right-shifted. Low-order bits are shifted out 
through the C (carry) bit. 

Arithmetic right-shift: 



sign of 
dst 


— > 


dst 


— > 


C 



If the count operand is zero, no shift is performed, and the C (carry) bit is 
set to 0. The count and dst operands are assumed to be signed integers. 
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ASH Arithmetic Shift 



Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N MSB of the output, 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C Set to the value of the last bit shifted out. 0 for a shift count of 0. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example 1 ash ri,r3 

Before Instruction: 

R1 = 10h = 16 
R3 = OA EOOOh 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 

R1 = 10h 

R3 = 0E0000 OOOOh 

LUF LV UF N Z V C = 0 1 0 1 0 1 0 
Example 2 ASH @98C3h,R5 

Before Instruction: 
DP = 80h 

R5 = 0AEC0 0001h 

Data at 80 98C3h = 0FFE8 = - 24 

LUFLV UF N Z V C-0 0 0 0 0 0 0 

After Instruction: 

DP = 80h 

R5 - 0FFFF FFAEh 

Data at 80 98C3h = 0FFE8 = -24 

LUF LV UF N Z V C = 0 0 0 1 0 0 1 
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Assembly Language Instructions 



Arithmetic Shift, 3 Operands ASH3 



Syntax ASH3 count, src, dst 
Operation jf ( coun t > 0) 

$rc<< count-* dst 
Else: 

src> > I count I -> cfef 

Operands src, count both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 
Typel 

31 24 23 16 15 87 0 



"i r "i i i i i i 1111 


T 


1 II 1 


I I 1 1 1 1 1 


- 1 - 1 1 1 "I i" l" l 11 


0 0 1 0 0 0 1 0 1 


T 


dst 


srd 


src2 



Type 2 

31 24 23 16 15 87 0 



1 II 1 II 1 1 

0 0 1 1 0 0 1 0 1 


T 


'" 1 1 1 ■ ""1 1 
dst 


■ i i™ i i "i r 1 r 
srd 


— i — i — i — i — r T I 

src2 



Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1, IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1, IRO, IR1) 


indirect mode (disp = 0,1, IRO, IR1 ) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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ASH3 Arithmetic Shift, 3 Operands 



Description The seven least-significant bits of the count operand are used to generate 
the twos-complement shift count. 

If the count operand is greater than zero, the src operand is left-shifted by 
the value of count Low-order bits shifted in are zero-filled, and high-order 
bits are shifted out through the status register's C (carry) bit. 

Arithmetic left-shift: 



MSB 



src 



If the count operand is less than zero, the src operand is right-shifted by the 
absolute value of count (e.g. -4 = right-shift 4. The high-order bits of the 
src operand are sign-extended as they are right-shifted. Low-order bits are 
shifted out through the C (carry) bit. 

Arithmetic right-shift: 



MSB 



src 



Cycles 
Status Bits 



If the count operand is zero, no shift is performed, and the C (carry) bit is 
set to 0. The count, src, and dst operands are assumed to be signed inte- 
gers. 

1 

LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C Set to the value of the last bit shifted out. 0 for a shift count of 0. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Parallel ASH3 and STI ASH3| |STI 



Syntax 



ASH3 count, src2, dst1 
STI src3, d$t2 



Operation If (counf^O): 

src2 « count -> dst 1 
Else: 

src2 » \count\ -> dst1 
|| src3->dst2 

Operands count register (RO — R7) 

$rc2 indirect (disp ■ 0, 1 , IRO, IR1 ) 

dst1 register (RO — R7) 

src3 register (RO — R7) 

dst2 indirect (disp - 0, 1 , IRO, IR1 ) 



Encoding 

31 



24 23 



1615 



87 



1 


■ i ™t " T r 


-r-T" 


" "i r 


1 ""I 


1 1 


m i r— i - 


i — r - 


- r-i r-r-r- 


I 1 1 


1 1 


0 1 0 0 1 


dsn 


count 


src3 




dst2 




src2 





Description The seven least-significant bits of the counf operand register are used to 
generate the twos-complement shift count of up to 32 bits. 

If the count operand is greater than zero, the dst operand is left-shifted by 
the value of the count operand. Low-order bits shifted in are zero-filled, and 
high-order bits are shifted out through the C (carry) bit. 

Arithmetic left-shift: 



scr2 



If the count operand is less than zero, the dst operand is right-shifted by the 
absolute value of the count operand. The high-order bits of the dst operand 
are sign-extended as it is right-shifted. Low-order bits are shifted out 
through the C (carry) bit. 

Arithmetic right-shift: 



sign of 
dst 



src2 



If the count operand is zero, no shift is performed, and the C (carry) bit is 
set to 0. The count and dst operands are assumed to be signed integers. 
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ASH3||STI 



Parallel ASH3 and STI 



All registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STI) reads from a 
register and the operation being performed in parallel (ASH3) writes to the 
same register, then STI accepts as input the contents of the register before 
it is modified by the ASH3. If src2 and dst2 point to the same location, src2 
is read before the write to dst2. 



LY 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C Set to the value of the last bit shifted out. 0 for a shift count of 0. 



Mode Bit OVM Operation is not affected by OVM bit value. 
Example ash3 ri, *ar6++ (iri) ,ro 

II STI R5, *AR2 



Before Instruction: 

AR6 = 80 9900h 
IR1 = 8Ch 

R1 = 0FFE8h = - 24 
R0 = 0h 
R5 = 35h = 53 
AR2 = 80 98A2h 

Data at 80 9900h = 0AE00 OOOOh 
Data at 80 98A2h = Oh 

LUF LV UF N Z V C-0 0 0 0 0 0 0 
After Instruction: 

AR6 = 80 998Ch 
IR1 = 8Ch 

R1 =0FFE8h = -24 
R0 = 0FFFF FFAEh 
R5 = 35h = 53 
AR2 = 80 98A2h 

Data at 80 9900h - 0AE00 OOOOh 

Data at 80 98A2h = 35h = 53 

LUF LV UF N Z V C = 0 0 0 1 0 00 



Cycles 



Status Bits 



LUF 



Unaffected. 
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Assembly Language Instructions 



Branch Conditionally (Standard) Bcond 



Syntax 
Operation 



Operands 



Encoding 



Bcond src 

If cond is true: 

If src is in register addressing mode (any register in CPU primary 
register file), 
src -> PC. 

If src is in PC-relative mode (label or address), 
displacement + PC + 1 -» PC. 
Else, continue. 

src conditional-branch addressing modes (B): 

0 register 

1 PC-relative 



31 




24 23 




16 15 


87 


0 


— 1 — I - 1 — I - 1 — 




r~i 




l I l I 


i i i 


i — i — i — i — i — i — i — rrr 


1 — 1 


0 11 0 10 


B 


0 0 0 


0 


cond 




register or displacement 





Description Bcond signifies a standard branch that executes in four cycles. A branch is 
performed if the condition is true (since a pipeline flush also occurs on a true 
condition; see Section 1 0.2 on page 1 0-4). If the src operand is expressed 
in register addressing mode, the contents of the specified register are 
loaded into the PC. If the srcoperand is expressed in PC-relative mode, the 
assembler generates a displacement: displacement = label - (PC of branch 
instruction + 1 ). This displacement is stored as a 1 6-bit signed integer in the 
1 6 least signif icant bits of the branch instruction word. This displacement is 
added to the PC of the branch instruction plus 1 to generate the new PC. 

The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 1 1 .2 on pagel 1 -1 0 for a list of condition mnemonics, 
encoding, and flags). 



Cycles 
Status Bits 



1 

LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Bcond Branch Conditionally (Standard) 



Example bz RO 

Before Instruction: 

PC . 2B00h 

RO = 0003 FFOOh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

PC = 3FF00h 
R0 = 0003 FFOOh 

LUFLV UF N Z V C = 0 0 0 0 0 00 



11-56 



Assembly Language Instructions 



Branch Conditionally Delayed and Annul If False Bcond AF 



Syntax 
Operation 



Operands 
Encoding 

31 



BcondAF src 

If (cond is true) 
If (src is a register) 

src -> PC 
If (src is a displacement) 
src + PC of branch + 3 -> PC 
Else: 

If ( cond is false) 

annul execute phase results of next three instructions and continue 
src conditional-branch addressing modes 



i — i — i — i — i— 

0 11 0 10 



24 23 

— i — r 



0 10 1 



16 15 



87 



i — i — i — i — i — i — i — i — i — i — i — i — i — i — r 

register or displacement 



cond 



Instruction Word Fields 



B 


src addressing modes 


0 


register mode 


1 


PC-relative mode 



Description If the condition is true, a branch and the three instructions following the 
branch instruction are executed. If the condition is false, it annuls the effect 
of the execute phase of the next three instructions and execution continues. 
If the src operand is in register mode, then the contents of the specified reg- 
ister are loaded into the PC. If the src operand is in PC-relative mode, then 
the sum of the PC of the branch instruction + 3 and the src is loaded into the 
PC. In PC-relative mode the srcfield is interpretted as a 1 6-bit signed interg- 
er. 

None of the three instructions following the BcondAF may be an instruction 
that modifies the program flow. Interrupts are disabled for the duration of 
the BcondAF instruction. BcondAF is particular useful for controlling the exit 
at the bottom of a loop. 



11 
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Bcond AF Branch Conditionally Delayed and Annul If False 



Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Branch Conditionally Delayed and Annul If True Bcond AT 



Syntax 
Operation 



Operands 

Encoding 

31 



Bcond AT src 

If {cond is true) 
If (src is a register) 
src-» PC 

annul execute phase results of next three instructions. 
If (src is a displacement) 
src + PC of branch +3 -» PC 

annul execute phase results of next three instructions. 
Else, continue. 

src conditional-branch addressing modes 



i — i — r 

o 1 1 



i — r— 

o 1 o 



24 23 



1 — I 1 

0 0 11 



i — i — r 

cond 



16 15 



87 



i — i — i — \ — i — i — i — i — i — i — i — r 

register or displacement 



instruction Word Fields 



B 


src addressing modes 


0 


register mode 


1 


PC-relative mode 



Description if the condition is true, it performs a branch and annuls the effect of the ex- 
ecute phase of the next three instructions. If the src operand is expressed 
in register mode, then the contents of the specified register are loaded into 
the PC. If the src operand is in PC-relative mode, then the sum of the PC 
of the branch instruction + 3 and the src is loaded into the PC. In PC-relative 
mode, the src field is interpreted as a 16-bit signed interger. 

None of the three instructions following the BcondAT may be an instruction 
that modifies the program flow. Interrupts are disabled for the duration of 
the BcondAT instruction. 

BcondAT instruction will not annul the status signals at the external inter- 
faces. The BcondAT is particular useful for controlling the entry at the top 
of the loop. 
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Bcond AT Branch Conditionally Delayed and Annul If True 



Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Branch Conditionally (Delayed) BcondD 



Syntax 
Operation 



Operands 



Encoding 

31 



BcondD src 

If condls true: 

If src is in register addressing mode (any register in CPU primary 
register file) 
src-» PC. 

If src is in PC-relative mode (label or address), 
displacement + PC + 3 -> PC. 
Else, continue. 

src conditional-branch addressing modes (B): 

0 register 

1 PC-relative 



24 23 



i i 

0 0 0 



t — i — r 

cond 



16 15 



T 



87 



i — i — i — i — i — i — i — i — r 

register or displacement 



I i i I i 

0 110 10 



Description BcondD signifies a delayed branch that allows the three instructions after 
the delayed branch to be fetched before the PC is modified. The effect is a 
single-cycle branch, and the three instructions following BcondD will not af- 
fect the cond. None of the three instructions following BcondD may be an 
instruction that modifies program flow. 

A branch is performed if the condition is true. If the srcoperand is expressed 
in register addressing mode, the contents of the specified register are 
loaded into the PC. If the srcoperand is expressed in PC-relative mode, the 
assembler generates a displacement: displacement = label - (PC of branch 
instruction + 3). This displacement is stored as a 1 6-bit signed integer in the 
1 6 least significant bits of the branch instruction. This displacement is added 
to the PC of the branch instruction plus 3 to generate the new PC. The 
TMS320C40 provides 20 condition codes that can be used with this instruc- 
tion (see Section 1 1 .2 on page 1 1 -1 0 for a list of condition mnemonics, en- 
coding, and flags). 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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BcondD Branch Conditionally (Delayed) 



Example BNZD 36 (36 = 24h) 
BefQrg Instruction; 
PC = 50h 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 

PC = 77h 

LUFLV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Branch Unconditionally (Standard) BR 



Syntax BR src 

Operation pc + 1 + src -> PC 

Operands src 24-bit signed immediate displacement 

Encoding 



31 


24 23 


16 15 


87 




0 


1 1 ""1 — 1 — 1 — 1 — 






— i — i — i — i — i — r 


—I — 1 — 1 — 1 — 1 — 1 — 1 — 


1 — 1 — i — i 


r— i — r— 


0 1 1 0 0 0 0 


0 






src (displacement) 







Description Performs an unconditional delayed branch. The src operand is assumed to 
be a 24-bit signed integer. 

Cycles 4 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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BRD Branch Unconditionally (Delayed) 

Syntax BRD src 
Operation pc + 3 + src -> PC 
Operands src 24-bit signed immediate displacement 
Encoding 

31 24 23 16 15 87 



t — i — i — r 

0 1 1 0 0 0 0 



i — i — i — i — i — i — i — \ — i — i — i — i — i — i — i — i — i — i — i — i — i — i — r 

src (displacement) 



Description Performs an unconditional delayed branch. The src operand is assumed to 
be a 24-bit signed integer. Interrupts are disabled during the BRD 
instruction. 

The three instructions following the BRD instruction are fetched and 
executed. None of these three instructions may modify the program flow 
(e.g., affect the PC value). 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Call Subroutine CALL 



Syntax 
Operation 



CALL src 

NextPC->*(++SP) 
PC + 1 + src-» PC 



Operands src 24-bit signed immediate displacement 



Encoding 

31 



1 — I — I — I — I — I— 

0 1 1 0 0 0 1 



24 23 



16 15 



87 



"i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — r 

src (displacement) 



Description Performs a call. The next PC value is pushed onto the system stack. The 
src operand + 1 + PC address of the CALL is loaded into the PC. The src 
operand is assumed to be a 24-bit signed immediate operand (displace- 
ment). 

Cycles 4 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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CALLcond Call Subroutine Conditionally 



Syntax 
Operation 



Operands 



Encoding 



CALLcond src 

If cond is true: 
Next PC -> *++SP 

If src is in register addressing mode (any register in CPU primary 
register file), 
src -» PC. 

If src is in PC-relative mode (label or address), 
displacement + PC + 1 -» PC. 
Else, continue. 

src conditional-branch addressing modes (B): 

0 register 

1 PC-relative 



31 




24 23 


16 15 


87 


0 


1 1 1 — 1 — 1 — 

0 11 1 0 0 


B 


— 1 — 1 — 1 — 

0 0 0 0 


1 ill 

cond 




i — i — i — i — i — i — i — i — i — i — r 

register or displacement 


—t — r™ 



Description A call is performed if the condition is true. If the condition is true, the next 
PC value is pushed onto the system stack. If the src operand is expressed 
in register addressing mode, the contents of the specified register are 
loaded into the PC. If the srcoperand is expressed in PC-relative mode, the 
assembler generates a displacement: displacement = label - (PC of call in- 
struction + 1 ). This displacement is stored as a 1 6-bit signed integer in the 
16 least significant bits of the call instruction word. This displacement is 
added to the PC of the call instruction plus 1 to generate the new PC. 

The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 1 1 .2 on page 1 1 -1 0 for a list of condition mnemonics, 
encoding, and flags). 

Cycles 5 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Call Subroutine Conditionally CALLcond 



Example callnz R5 

Before Instruction; 

PC = 123h 
SP = 80 9835h 
R5 = 789h 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

PC = 789h 
SP = 80 9836h 
R5 = 789h 

Data at 80 9836h = 124h 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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CMPF Compare Floating-Point Values 



Syntax 

Operation 

Operands 



CMPF src, dst 
dst -src 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

cfsfregister(R0-R11) 



Encoding 



31 


24 23 






16 


15 




87 




0 






— 1 — 


— i — i — r 




— 1 — 1 — 1 


1 — 1 — I - 


t r r t ■""!' 


"'""1 T 1 ■'" 




0 0 0 


0 0 1 0 0 0 


G 


dst 








src 







Description The srcoperand is subtracted from the dst operand. The result is not loaded 
into any register, thus allowing for nondestructive compares. The dst and 
src operands are assumed to be floating-point numbers. 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example cmpf *+AR4,R6 

Before Instruction: 

AR4 = 80 98F2h 

R6 » 070C80 OOOOh = 1 .4050e+02 

Data at 80 98F3h . 070C 8000h = 1 .4050e + 02 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR4 = 80 98F2h 

R6 = 070C80 OOOOh = 1 .4050e + 02 

Data at 80 98F3h = 070C 8000h = 1 .4050e + 02 

LUF LVUFNZVC-0 0 0 0 1 0 0 
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Assembly Language Instructions 



Compare Floating-Point Values, 3 Operands CMPF3 



Syntax CMPF3 src2, srd 

Operation src1-src2 

Operands $rd -sr& both type 1 or type 2 three-operand addressing modes 
Encoding 



Typel 

31 



24 23 



16 15 



87 



— 1 — 1— 1 — 1 — 1 — 1 — 1 — 1 

0 0 1 0 0 0 1 1 0 


T 


— i — r - i — i — 

0 0 0 0 0 


— 1 — 1 — 1 — I — 1 — 1 — 1 — 

srd 


i — i — i — i — n — r— 

src2 


Type 2 

31 24 23 


16 15 87 0 


ill I I I I I 

0 0 1 1 0 0 1 0 1 


i 

T 


i i i i 

0 0 0 0 0 


i i i i i i i 

srd 


i i i i i i i 

src2 



Instruction Word Fields 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (R0 — R11 ) 


register mode (RO — R1 1 ) 


01 


indirect mode (disp = 0, 1 , IR0, IR1) 


register mode (RO — R1 1 ) 


10 


register mode (R0 — R11 ) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


01 


register mode (any CPU register) 


indirect mode *+ARn (5-bit unsigned 
displacement) 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsignied 
displacement) 



Type 1 



Type 2 
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CMPF3 Compare Floating-Point Values, 3 Operands 

Description The src2 operand is subtracted from the srd operand. The result is not 
loaded into any register. This allows for nondestructive compares. The srd 
and src2 operands are assumed to be floating-point numbers. 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Compare Integer CMPI 



Syntax 

Operation 

Operands 



CMPI src, dst 
dst -src 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 






16 15 


87 


0 


-I — 1 — 

0 0 0 


i i i i i 

0 0 1 0 0 1 


G 


I I I I 

dst 


I I I 1 1 I I I 1 1 1 1 1 1 1 

src 



Description The srcoperand is subtracted from the dst operand. The result is not loaded 
into any register, thus allowing for nondestructive compares. The dst and 
src operands are assumed to be signed integers. 

Cycles 1 

Status Bits LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example cmpi R3,R7 

BefQre Instruction: 



R3 - 898h = 2200 
R7 = 3E8h = 1000 
LUF LV UF N Z 

After Instruction: 

R3 = 898h = 2200 
R7 = 3E8h = 1000 
LUF LV UF N Z 



V C=0 0 0 0 0 0 0 



V C=0 0 0 10 0 0 
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CMPI3 Compare Integer, 3 Operands 



Syntax CMPI3 src2, srrf 
Operation srrf -src2 

Operands srrf - srcZ both type 1 or type 2 three-operand addressing modes 

Encoding 

TVpel 

31 24 23 16 15 87 0 



i i I 11 " r i — i""""! i 


— J_ 


■t — rn 


' 1 1" 


1"" 1 1 1 1 1 


1 1 11 1 1 1 


I - 1 1 "" 


0 0 1 0 0 0 1 1 1 


T 


0 0 0 0 0 




srrf 


src2 





Type 2 

31 24 23 16 15 87 0 





1 


1 1 1 1 


1 — 1" 1" 1 1"" 1 1 


— i — i — r— r i t-i 


0 0 1 1 0 0 1 1 1 


T 


0 0 0 0 0 


srrf 


src2 



Instruction Word Fields 



Type 2 



T 


srrf addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1 , IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp - 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRQ, IR1) 


indirect mode (disp - 0, 1 , IRO, IR1) 




T 


srrf addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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Assembly Language Instructions 



Compare Integer, 3 Operands CMPI3 



Description The src2 operand is subtracted from the srd operand. The result is not 
loaded into any register. This allows for nondestructive compares. The src 1 
and src2 operands are assumed to be signed integers. 

Cycles 1 

Status Bits LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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DBcond Decrement and Branch Conditionally (Standard) 



Syntax 
Operation 



Operands 



DBcond ARn, src 

ARn - 1 -> ARn 

If cond is true and ARn > 0 : 

If src is in register addressing mode (any register in CPU primary 
register file), 
src-> PC. 

If src is in PC-relative mode (label or address), 
displacement + PC + 1 -» PC. 
Else, continue. 

src conditional-branch addressing modes (B): 

0 register 

1 PC-relative 

ARn register (any register in CPU primary register file) 



Encoding 

31 



24 23 



16 15 



87 



r r-r-r-'i " 




-T""l — 






— r — i — i i — r i i iri i i » i i 


0 11 0 11 


B 


ARn 


0 


cond 


register or displacement 



Description DBcond signifies a standard branch that executes in four cycles because 
the pipeline must be flushed if cond is true. The specified auxiliary register 
is decremented and a branch is performed if the condition is true and the 
specified auxiliary register is greater than or equal to zero. 

The auxiliary register is treated as a 32-bit signed integer. The most signifi- 
cant eight bits are unmodified by the decrement operation. The comparison 
of the auxiliary register uses only the 32 least significant bits of the auxiliary 
register. Note that the branch condition does not depend on the auxiliary 
register decrement. 

If the src operand is expressed in register addressing mode, the contents 
of the specified register are loaded into the PC. If the src operand is ex- 
pressed in PC-relative addressing mode, the assembler generates a dis- 
placement: displacement = label - (PC of branch instruction + 1 ). This inte- 
ger is stored as a 1 6-bit signed integer in the 1 6 least significant bits of the 
branch instruction word. This displacement is added to the PC of the branch 
instruction plus 1 to generate the new PC. 

The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 1 1 .2 on page 11 -1 0 for a list of condition mnemonics, 
encoding, and flags). 
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Assembly Language Instructions 



Decrement and Branch Conditionally (Standard) DBcond 



Cycles 4 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example dblt ar3,R2 

Before Instruction: 

PC = 5Fh 
AR3 = 12h 
R2 = 9Fh 

LUFLV UF N Z V C = 0 0 0 1 0 0 0 

After Instruction: 

PC = 9Fh 
AR3 = 11h 
R2 = 9Fh 

LUF LV UF N Z V C - 0 0 0 1 0 0 0 



11 
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DBcondD Decrement and Branch Conditionally (Delayed) 



Syntax 
Operation 



Operands 



DBcondD ARn, src 

ARn - 1 -> ARn 

If cond is true and ARN > 0: 

If src is in register addressing mode (any register in CPU primary 
register file), 
src-> PC 

If src is in PC-relative mode (label or address) 
displacement + PC + 3 -> PC. 
Else, continue. 

src conditional-branch addressing modes (B): 

0 register 

1 PC-relative 

ARn register (any register in CPU primary register file) 



Encoding 

31 



t — i — i — i — r- 

0 11 0 11 



24 23 



i — r 

ARn 



16 15 



87 



i — i — i — i — m — i — i — i — i — r 

register or displacement 



cond 



Description DBcondD signif ies a delayed branch that allows the three instructions after 
the delayed branch to be fetched before the PC is modified. The effect is a 
single-cycle branch. The specified auxiliary register is decremented and a 
branch is performed if the condition is true and the specified auxiliary regis- 
ter is greater than or equal to zero. (The three instructions following the 
DBcondD must not affect the cond). 

The auxiliary register is treated as a 32-bit signed integer. The most signifi- 
cant eight bits are unmodified by the decrement operation. The comparison 
of the auxiliary register uses only the 32 least significant bits of the auxiliary 
register. Note that the branch condition does not depend on the auxiliary 
register decrement. 

If the src operand is expressed in register addressing mode, the contents 
of the specified register are loaded into the PC. If the src is expressed in 
PC-relative addressing, the assembler generates a displacement: displace- 
ment « label - (PC of branch instruction + 3). This displacement is added 
to the PC of the branch instruction plus 3 to generate the new PC. Note that 
bit 21 = 1 for a delayed branch. 

The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 1 1 .2 on page 1 1 -1 0 for a list of condition mnemonics, 
encoding, and flags). 
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Assembly Language Instructions 



Decrement and Branch Conditionally (Delayed) DBcondD 



Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example dbzd ars, $+11 Oh 

Before Instruction: 

PC = 0h 
AR5 = 67h 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

PC = 110h 
AR5 - 66h 

LUFLV UF N Z V C = 0 0 0 0 0 00 



11 
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FIX Floating-Point to Integer Conversion 



Syntax 

Operation 

Operands 



FIX src, dst 
1\x(src) -» dst 

src general addressing modes (G): 

0 0 register (RO — R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 






16 


15 


87 




0 


— I — 1 " 




r 




n — 


1 1 1 1 


1 — I - 1"" 1 T""l 


1 I I 


h — r 


0 0 0 


0 0 10 1 0 


G 


dst 






src 







Description The floating-point operand src is converted to the nearest integer less than 
or equal to it in value, and the result is loaded into the dst register. The src 
operand is assumed to be a floating-point number and the dst operand a 
signed integer. 

The exponent field of the result register (if it has one) is not modified. 

Integer overflow occurs when the floating-point number is too large to be 
represented as a 32-bit twos-complement integer. In the case of integer 
overflow, the result will be saturated in the direction of overflow. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and, the condition flags are modified the destination 
register is R0 — R11 , the condition flags are modified. If ST (SET COND) 
= 1 , they are modified for all destination registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Floating-Point to Integer Conversion FIX 



Example fix ri,r2 

Before Instruction: 

R1 = 0A2820 OOOOh = 1 .3454e + 3 
R2 = Oh 

LUF'LV UF N Z V C-0 0 0 0 0 0 0 

After Instruction; 

R1 = 0A2820 OOOOh = 13454e + 3 
R2 = 541h = 1345 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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11 



FIX||STI Parallel FIX and STI 



Syntax 



Operation 



Operands 



Encoding 

31 



FIX src2, dst1 
|| STI src3, dst2 

1\x{src2 ) dst1 
|| $rc3-*dst2 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst1 register (RO - R7) 

src3 register (R0-R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



24 23 



16 15 



87 



1 


1 1 1 1"- 




r r""" 


— r ™r 


1 1 1 1 1 1 "J ■ 


— 1 — I — 1 — 1 T \ - "1 


1 1 


0 10 10 


dst1 


0 0 0 


src3 


dst2 


src2 



Description A f loating-point-to-integer conversion is performed. All registers are read at 
the beginning and loaded at the end of the execute cycle. This means that 
if one of the parallel operations (STI) reads from a register, and the operation 
being performed in parallel (FIX) writes to the same register, then STI ac- 
cepts as input the contents of the register before it is modified by FIX. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

Integer overflow occurs when the floating-point number is too large to be 
represented as a 32-bit twos-complement integer. In the case of integer 
overflow, the result will be saturated in the direction of overflow. 

Cycles 1 

Status Bits LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11-80 



Assembly Language Instructions 



Parallel FIX and STI FIX||STI 



Example fix *++AR4(l),Rl 
II STI RO , *AR2 

Before Instruction: 

AR4 = 80 98A2h 
R1 - Oh 

RO = ODCh = 220 
AR2 = 80 983Ch 

Data at 80 98A3h = 733 COOOh . 1 .7950e + 02 
Data at 80 983Ch - Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR4«80 98A3h 
R1 =0B3h = 179 
RO = ODCh = 220 
AR2 = 80 983Ch 

Data at 80 98A3h = 733 COOOh = 1 .79750e + 02 

Data at 80 983Ch = ODCh = 220 

LUFLV UF N Z V C = 0 0 0 0 0 00 



11 
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FLOAT Integer to Floating-Point Conversion 



Syntax 

Operation 

Operands 



FLOAT src, dst 
float (src) -» dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

cfcf register (R0-R11) 



Encoding 



31 


24 23 




16 15 


87 


0 


— t-t — 

0 0 0 


1 1 1 1 1 

0 0 1 0 0 1 


T " 

G 


I l I l 

dst 


■ i i i 


r i i r i "i" i 

src 


— i — i — i — r 



Description The integer operand src is converted to the floating-point value equal to it, 
and the result loaded into the dst register. The src operand is assumed to 
be a signed integer, and the dst operand a floating-point number. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 
UF 0. 
N 
Z 
V 

c 



1 if a negative result is generated, 0 otherwise. 
1 if a zero result is generated, 0 otherwise. 
0. 

Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
Example float *++ar2(2),R5 

Before Instruction: 

AR2 = 80 9800h 

R5 = 034C 2000h = 1 .275781 25e + 01 

Data at 80 9802h = OAEh = 174 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR2 = 80 9802h 

R5 - 072E0 OOOOh ~ 1 .74e + 02 

Data at 80 9802h = OAEh = 174 

LUFLV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Parallel FLOAT and STF FLO AT| |STF 



Syntax 



Operation 



Operands 



Encoding 

31 



FLOAT src2, dst1 
STF src3, dst2 

i\oat(src2 ) -» dst1 
$rc3 dst2 



src2 
dst1 
src3 
dst2 



indirect (disp = 0, 1, IRO, IR1) 
register (R0 - R7) 
register (R0 - R7) 
register (disp = 0,1, IRO, IR1) 



24 23 



16 15 



87 





"TT'T 1"""" 


i i 


i i 


— 1 — 1 


1 T- 1 — 1 t — r~r 


i — i — i — i — i — i — r— 


1 1 


0 10 11 


dst1 


0 0 0 


src3 


dst2 


src2 



Description An integer-to-floating-point conversion is performed. All registers are read 
at the beginning and loaded at the end of the execute cycle. This means that 
if one of the parallel operations (STF) reads from a register and the opera- 
tion being performed in parallel (FLOAT) writes to the same register, then 
STF accepts as input the contents of the register before it is modified by 
FLOAT. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 0. 

C Unaffected. 



Mode Bit OVM Operation is affected by OVM bit value. 



11 
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FLOAT| |STF Parallel FLOAT and STF 



Example FLOAT * +AR2 ( IRO ) , R6 

II STF R7,*AR1 

Before Instruction: 

AR2 = 80 98C5h 
IRO = 8h 
R6 = 0h 

R7 m 034C20 OOOOh = 1 .275781 25e + 01 
AR1 = 80 9933h 
Data at 80 98CDh = OAEh = 174 
Data at 80 9933h = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR2 = 80 98C5h 
IRO - 8h 

R6 m 072E00 OOOOh = 1 .740e + 02 

R7 = 034C20 OOOOh = 1 .275781 25e + 01 

AR1 = 80 9933h 

Data at 80 98CDh = OAEh = 174 

Data at 80 9933h = 034C 2000h = 1 .275781 25e + 01 

LUFLV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Convert From IEEE Format FRIEEE 



Syntax 

Operation 

Operands 

Encoding 



FRIEEE src, cfet 

convert src from IEEE format 



dst 



src direct or indirect addressing modes 
cfet extended-precision register (RO - R11) 



31 


24 23 






16 15 


87 




V 3 


— i — i — i — i — m 


— r — T 




i i i 


i 


i i i i 


"I — i — i — i — i — i — r 


n — i — r 




0 0 0 1 1 1 


0 0 0 


Q 


dst 






src 







Instruction Word Fields 



G 


src addressing modes 


01 


direct mode 


10 


indirect mode 



Description The src operand is converted from the IEEE floating-point format to the 
twos-complement floating-point format. 

The src operand comes from memory. The converted result goes into an 
extended precision register as a single-precision floating-point number. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Set if overflow, otherwise unchanged. 

UF 0. 

N Sign of the result. 

Z 1 if result is 0, 0 otherwise. 

V 1 if overflow, 0 otherwise. 

C Unaffected 

Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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FRIEEE||STF Parallel FRIEEE and STF 
Syntax 



Operation 



Operands 



Encoding 



FRIEEE src2, dst1 
|| STF src3, d$t2 

convert src2 from IEEE format 
in parallel with 
src3 -» dst2 



dst1 



src2 indirect mode (disp = 0, 1 , IRO, IR1) 

dst1 register mode (R0 - R7) 

src3 register mode (R0 - R7) 

dst2 indirect mode (disp = 0, 1 , IRO, IR1) 



31 




24 23 




16 15 




87 




0 


- i " i -r 
1 1 1 


"i" 'i i" 

10 0 1 


"T"H " 

dst1 


" 1 1 

0 0 0 


I l 

src3 


i i i i i i i 

dst2 




1 11 1 1 

src2 


1 I 



Description The src2 operand is converted from the IEEE floating-point format to the 
twos-complement format. The converted result goes into an extended-pre- 
cision register dst 1 as a single-precision floating-point number. 

A floating-point store is done in parallel. 

If src2 and dst2 point to the same location, then src2 is read before the write 
to dst2. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Set if overflow, otherwise unchanged. 

UF 0. 

N Sign of the result. 

Z 1 if result is 0, 0 otherwise. 

V 1 if overflow, 0 otherwise. 

C Unaffected 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Interrupt Acknowledge IACK 



Syntax 
Operation 

Operands 



Encoding 

31 



IACK src 



Perform a dummy read op eration with IACK = 0. 
At end of dummy read, set IACK to 1 . 

src general addressing modes (G): 
0 1 direct 
1 0 indirect 



24 23 



16 15 



87 



1 1 


— ,_, — I — 1 — I — 


— 1 — 


— I — 1 — 1 — 1 — 


— i — i — i — i — i — i — i — i — i — r 


—i — i — i — i — r- 


0 0 0 


11 0 110 


G 


0 0 0 0 0 


src 





Description A dummy re ad ope ration is performed with IACK = 0. At the end of the 
dummy read, IACK is set to 1 if off-chip memory is specified. This instruction 
can be used to generate an external interrupt acknowledge. If the address 
specif ied is off-chip, a read operation from that address is performed. The 
IACK signal and the address 

can then be used to signal interrupt acknowledge to external devices. The 
data read by the processor is unused. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 

Example iack *ar5 

Before Instruction: 

IACK"= 1 
PC = 300h 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 



IACK = 1 
PC = 301h 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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IDLE Idle Until interrupt 



Syntax IDLE 

Operation 1 ST(GIE) 
Next PC -> PC 
Idle until interrupt. 

Operands None 

Encoding 

31 24 23 16 15 87 



t — r 

0 0 0 



1 — I — I — I — I — I — I — I — I — I — I — I — I — I — T — I — I — I — I — I — I — I 

0 0 0 0 0 0 0 00000 00000000000 



I I I I I 

0 0 110 0 



Description The global interrupt enable bit is set, the next PC value is loaded into the 
PC, and the CPU idles until an interrupt is received. When the interrupt is 
received, the contents of the PC are pushed onto the active system stack. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Link and Jump LAJ 



Syntax LAJ src 

Operation pc of LAJ + 4 -» extended-precision register R11 
src + 3 + PC of LAJ -> PC 

Operands scr 24-bit signed immediate displacement 

Encoding 

31 24 23 1615 87 



t — mr 

0 1 1 0 0 0 1 1 



"i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — r 

src (displacement) 



Description LAJ performs a single cycle subroutine call. The three instructions following 
the LAJ instruction are performed. The return address (address of the LAJ 
instruction + 4) is placed in extended-precision register R11 . The address 
branched to is formed by adding the src operand to the PC of the LAJ in- 
struction + 3. 

None of the three instructions following the LAJ instruction should modify 
the program flow. Interrupts are disabled for the duration of the LAJ in- 
struction. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 



11 



11-89 



LA Jcond Link and Jump Conditionally 



Syntax LAJcond src 

Operation If (cond is true) 

If (src is a register) 
PC of LAJcond + 4 -» extended-precision register R1 1 
src -» PC 
If (src is a displacement) 
PC of LAJcond + 4 -> extended-precision register R11 
src + PC of the LAJ + 3 -> PC 
Else, continue. 

Operands src conditional-branch addressing modes 



Encoding 

31 



1 — I — I — I 

0 111 



24 23 



16 15 



87 



i — i — i — i — i — i — i — i — i — i — i — r 

register or displacement 



0 0 



0 0 0 1 



cond 



Instruction Word Fields 



B 


src addressing modes 


0 


register mode 


1 


PC-relative mode 



Description LAJcond performs a conditional single-cycle subroutine call. The three in- 
structions following the LAJcond instruction are performed. The return ad- 
dress (address of the LAJ instruction + 4) is placed in extended-precision 
register R11 . The address branched to is formed by either register mode 
or PC-relative mode. 

None of the three instructions following the LAJcond instruction may modify 
the program flow. Interrupts are disabled for the duration of the LAJcond 
instruction. 



Cycles 
Status Bits 



1 

LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Link and Trap Conditionally LATcond 



Syntax 
Operation 



Operands 
Encoding 



LATcond N 

If {cond is true) 

ST(GIE) -> ST(PGIE) 
ST(CF) -» ST(PCF) 

0 -> ST(GIE) 

1 -» ST(CF) 

PC of LAcond+ 4 -> extended-precision register R11 
trap vector N -> PC 

Else, continue. 

N immediate mode - trap number (0 < N < 51 1 ) 



31 


24 23 


16 15 


87 




0 


0 1 1 


1 1 1 1 1 

10 1 0 0 1 


1 

0 0 


i i i i 

cond 


i i i i i i i 

0 0 0 0 0 0 0 


l l l l l I l I 

N 



Description Performs a delayed conditional trap. If traps are to be nested, you may need 
to save the status register before executing LATcond. If the condition is true, 
ST bits GIE and CF are saved in PGIE and PCF in the status register. Then 
all interrupts are disabled (0 GIE), and the cache is frozen (1 CF). The 
contents of the PC of the LATcond + 4 are placed in R31 , and the PC is 
loaded with the contents of the specified trap vector (N). If the condition is 
not true, then continue normal operation. 

The three instructions following LATcond will be fetched and executed. 
They may not be instructions that modify the program flow or modify the sta- 
tus register. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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LBb Load Byte 



Syntax LBb src, dst 

Operation Sign-extended byte (3, 2, 1 , 0) of src -> dst 
b= byte to load (3, 2, 1,0) 

I 3 | 2 I 1 I oH = b (byte designator 3-0) 

Operands src register, direct, or indirect addressing modes 

dst register mode (any register in CPU primary register file) 

Encoding 

31 24 23 16 15 87 0 



'■ 1 "I J — 1 — 1 — 1 — 




1 






i i r 




10 1 10 0 0 


B 


G 


dst 


src 







Instruction Word Fields 



G 


src addressing modes 


00 


register mode 


01 


direct mode 


10 


indirect mode 




B 


src byte 


00 


byte 0 LS byte 


01 


bytel 


10 


byte 2 


11 


byte 3 MS byte 



Description The specified byte of the src operand is sign-extended and right-shifted into 
the 8 LSBs of the dst register. The src byte is signed. 

Cycles 1 
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Assembly Language Instructions 



Load Byte LBb 



Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example LB2 Rl, R2 ; sign extended byte 2 of Rl -» R2 

Before instruction: 

R1 = 00AB OOOOh 
R2 = 0000 OOOOh 

After instruction: 

R1 = 00AB OOOOh 
R2 = FFFF FFABh 



11 
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LBUb Load Byte Unsigned 



Syntax LBUb src, dst 
Operation Byte (3, 2, 1 , 0) of src -> dst 
b= byte to load I (3, 2, 1,0) 

I 3 I 2 I 1 I 0 I - b (byte designator 3-0) 

Operands src register, direct, or indirect addressing modes 

dst register mode (any register in CPU primary register file) 

Encoding 

31 24 23 16 15 87 0 



I - 1 1 — 1 — 1 — 1 — 






ii i i 


i — i — t — ™i — i "T i- r r r 


m i i r i i 


10 1 10 0 1 


B 


G 


dst 


src 





Instruction Word Fields 



G 


src addressing modes 


00 


register mode (any CPU register) 


01 


direct mode 


10 


indirect mode 




B 


src byte 


00 


byte 0 LS byte 


01 


bytel 


10 


byte 2 


11 


byte 3 MS byte 



Description The specified byte of the src operand is right-shifted without sign-exten- 
sion, into the 8 LSBs of the dst register. The src byte is unsigned. 

Cycles 1 
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Assembly Language Instructions 



Load Byte Unsigned LBUb 

Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 
N 0. 

Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example LB2 ri, R2 

Before Instruction: 

R1 = 00AB OOOOh 
R2= 0000 OOOOh 

After Instruction: 

R1 = 00AB OOOOh 
R2= 0000 OOABh 



11 
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LDA Load Address Register 



Syntax LDA 

Operation S rc dst 

Operands src general addressing modes 

dst register mode (address registers only) 

Encoding 



31 


24 23 




16 15 


87 




0 


1 1 1 I ■ ! 


"■ r "i ■ i 


-T 


1 1 1 1 


i I i i 


r i 1 -"! -r t i 


r— r i 


1 1 11 


0 0 0 1 1 


110 1 


G 


dst 




src 







Instruction Word Fields 



G 


src addressing modes 


00 


register mode (any CPU register) 


01 


direct mode 


10 


indirect mode 


11 


immediate mode 



Description The src operand is loaded into the dst register. The dst register may be 
any of the address registers: ARO - AR7, IRO, IR1 , DP, BK or SP. The load 
is done by the end of the read phase of the pipeline. As a result, LDA is one 
cycle faster than LDI for loading these registers. (All operands are treated 
as signed integers.) 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Load Floating-Point Exponent LDE 



Syntax LDE src, dst 
Operation S rc(exp) -> cfef(exp) 

Operation src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (R0-R11) 



Encoding 



31 


24 23 




16 15 




87 




0 


1 1 


1 1 1 1 1 


1 


l l l l 


i i 




1 1 1 1 


1 1 1 


1 — 1 


0 0 0 


0 0 110 1 


G 


dst 






src 







Description The exponent field of the src operand is loaded into the exponent field of the 
cfef register. No modification of the dst register mantissa field is made unless 
the value of the exponent loaded is the reserved value of the exponent for 
zero as determined by the precision of the src operand. Then, the mantissa 
f ield of the dst register is set to zero. The src and dst operands are assumed 
to be floating-point numbers. Immediate values are evaluated in the short 
floating point format. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example lde ro,R5 

Before Instruction: 

R0 = 020005 6F30h = 4.00066337e + 00 
R5 = 0A056F E332h = 1 .06749648e + 03 
LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

R0 = 020005 6F30h = 4.00066337e + 00 
R5 = 02056F E332h = 4.16990814e + 00 
LUFLV UF N Z V C = 0 0 0 0 0 00 
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LDEP Load Integer From Expansion Register File to Primary Register File 



Syntax LDEP src, dst 
Operation S rc -» dst 

Operands src expansion register file register (I VTP or TVTP) 

dst register mode (any register in CPU primary register file) 

Encoding 



31 24 23 




1615 87 


0 


i i i "i i i " r T'" 

0 11 10 1 10 0 


! 

0 0 


1 1 1 1 

dst 


1 1 1 l 1 1 I l 1 

00 0 000 0000 


1 1 1 1 1 

src 



Description This is a means to load a CPU register with the contents of the I VTP register 
(interrupt-trap table pointer) or the TVTP register. These registers are de- 
scribed in Section 3.2. 

The src operand register from the expansion-register file is loaded into the 
dst register in the primary register file. The dst register content is assumed 
to be an integer. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Load Floating-Point Value LDF 



Syntax 

Operation 

Operands 



LDF src, dst 
src -» dst 

src general addressing modes (G): 
0 0 register (R0-R11) 

0 1 direct ' 

1 0 indirect 

1 1 immediate 

dst register (R0-R11) 



Encoding 



31 


24 23 




16 15 


87 




0 


1 1 


i i i i ■ r ~ 




i i i i 


i i i i 


i i i i i i 


i i i 


n — r— 


0 0 0 


0 0 1110 


G 


dst 




src 







Description The src operand is loaded into the dst register. The dst and src operands 
are assumed to be floating-point numbers. 

Cycles 1 



Status Bits LUF Unaffected 
LV Unaffected 
UF 0. 
N 
Z 
V 

c 



1 if a negative result is loaded, 0 otherwise. 
1 if a zero result is loaded, 0 otherwise. 
0. 

Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 

Example LDF @9800h,R2 

Before Instruction: 

DP = 80h 
R2 = Oh 

Data at 80 9800h = 10C5 2A00h = 2.19254303e + 00 
LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

DP = 80h 

R2 = 010C52 AOOOh = 2.19254303e + 00 

Data at 80 9800h = 1 0C5 2A00h - 2.19254303e + 00 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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LDFcond Load Floating-Point Value Conditionally 



Syntax 
Operation 



Operands 



LDFcond src, dst 

If cond is true: 

src dst 
Else: 

dst is unchanged. 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (R0-R11) 



Encoding 

31 



24 23 



16 15 



87 



1 1 1 


— i — r t — i — 


i 


i i i i 


i i i i i i i i i i 


' I l l l I 


0 10 0 


cond 


G 


dst 


src 





Description if the condition is true, the src operand is loaded into the dst register. Other- 
wise, the dst register is unchanged. The dst and srcoperands are assumed 
to be floating-point numbers. 

The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 1 1 .2 on page 1 1 -1 0 for a list of condition mnemonics, 
encoding, and flags). Note that an LDFU (load floating-point unconditional- 
ly) instruction is 

useful for loading R0 - R1 1 without affecting condition flags. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Load Floating-Point Value Conditionally LDFcond 



Example ldfz R3,R5 

Before Instrgptipn; 

R3 = 2CFF2C D500h = 1.77055560e +13 
R5 - 5F0000 003Eh = 3.961 40824e + 28 
LUF LV UF N Z V C = 0 0 0 0 1 00 

After instruction: 

R3 = 2CFF2C D500h « 1.77055560e +13 
R5 = 2CFF2C D500h = 1.77055560e +13 
LUF LV UF N Z V C = 0 0 0 0 1 0 0 
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11 



LDFI Load Floating-Point Value, Interlocked 



Syntax 
Operation 

Operands 



LDFI src, dst 

Signal interlocked operation. 
src -> dst 

src general addressing modes (G): 
0 1 direct 
1 0 indirect 

cfcf register (R0-R11) 



Encoding 



31 


24 23 




16 15 




87 




0 


i i 


1"" 1" "1 ""1 1""'" 


1 


1 1 1 1 


I I 


1 1 1 


1 — I — 1 — 1 — 1" 


i r i" 11 


i — r" 


0 0 0 


0 0 1111 


G 


dst 






src 







Description The src opera nd is lo a ded into the dst register. An interlocked operation is 
signaled over LOCK or LLOCK. The src and dst operands are assumed to 
be floating-point numbers. Note that only direct and indirect modes are al- 
lowed. Refer to Section 6.5 (page 6-13) and Section 7.7 (page 7-39) for 
detailed descriptions. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example LDFI *+AR2,R7 

Before Instruction: 

AR2 = 8098F1h 
R7 = 0h 

Data at 80 98F2h = 584 COOOh = - 6.281 25e + 01 
LUF LV UF N Z V C - 0 0 0 0 0 0 0 

After Instruction: 

AR2 = 8098F1h 

R7 = 0584C0 OOOOh = - 6.281 25e + 01 

Data at 80 98F2h = 584 COOOh = - 6.281 25e + 01 

LUF LV UF N Z V C = 0 0 0 0 0 0 1 
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Assembly Language Instructions 



^ ^ parallel LDF and LDF LDFjjLDF 

Symax LDF src2, dsf2 

|| LDF srcl, dst1 

Operation S rc2 -» dst2 
|| srcf -» dst1 

Operands srd indirect (disp = 0, 1 , IRO, IR1 ) 

dst 1 register (RO — R7) 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst2 register (RO — R7) 

Encoding 



31 




24 23 




16 15 




87 




0 


— I — 

1 1 


0 0 0 1 0 


dst2 


—} — 1 — 

dst1 


i i 

0 0 0 


1 1 1 1 1 1 1 

srd 


l l l l l l l 

src2 



Description Two floating-point loads are performed in parallel. If the LDFs load the same 
register, the assembler issues a warning. The result is that of LDF src2 f dst2. 



Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 



11 



11-103 



LDF| | LDF Parallel LDF and LDF 



Example lbf - ARl(iR0),R7 

II LDF *AR7++(1) ,R3 

Before Instruction: 

AR1 = 80 985Fh 
IRO = 8h 
R7 = 0h 

AR7 - 80 988Ah 
R3 = Oh 

Data at 80 9857h = 70C 8000h = 1 .4050e + 02 
Data at 80 988Ah = 57B 4000h - 6.281 250e + 01 
LUFLVUFNZVC = 0000000 

After Instruction: 

AR1 = 80 9857h 
R0 = 8h 

R7 = 070G80 OOOOh = 1 .4050e + 02 
AR7 - 80 988Bh 

R3 = 057B40 OOOOh = 6.281 250e + 01 
Data at 80 9857h = 70C 8000h = 1 .4050e + 02 
Data at 80 988Ah = 57B 4000h = 6.281 250e + 01 
LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Parallel LDFand STF LDF||STF 



Syntax 
Operation 



LDF src2, dst1 
STF src3, d$t2 

src2 -» dst1 
src3 -> dst2 



Operands $rc2 indirect (disp = 0, 1 , IRO, IR1 ) 
dst1 register (RO - R7) 
src3 register (RO - R7) 
dst2 indirect (disp = 0, 1 , IRO, IR1) 



Encoding 

31 



24 23 



16 15 



87 



1 


1 1 1 1 


— I — I — 


— I — I — 




— i — i — i — i — i — r-i— 


— i — i — r— i — i — i — r- 


1 1 


0 110 0 


dst1 


0 0 0 


$rc3 


dst2 


src2 



Description A floating-point load and a floating-point store are performed in parallel. 



Cycles 



If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

1 



Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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LDF| [STF Parallel LDF and STF 



Example ldf * ar2 ( l ) , Rl 

II STF R3, *AR4++ (IR1) 

Before Instruction: 

AR2 = 80 98E7h 
R1 =0h 

R3 = 057B40 OOOOh = 6.281 25e + 01 

AR4 = 80 9900h 

IR1=10h 

Data at 80 98E7h - 70C 8000h = 1 .4050e + 02 
Data at 80 9900h = Oh 

LUF LV UF N Z V C - 0 0 0 0 0 0 0 
After Instruction: 

AR2 = 80 98E6h 

R1 = 070C80 OOOOh = 1.4050e + 02 
R3 = 057B40 OOOOh = 6.281 25e + 01 
AR4 = 80 9910h 
IR1=10h 

Data at 80 98E7h = 70C 8000h = 1 .4050e + 02 
Data at 80 9900h = 57B 4000h = 6.281 25e + 01 
LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Load 16 MSBs With 16-Bit Immediate LDHI 



Syntax LDHI src, dst 

Operation S rc-> 16MSBsofofsf 

Operands src 1 6-bit unsigned immediate 
dst register mode 

Encoding 



31 24 23 




16 15 


87 


0 


-l — r~ i — i — i — i — i — i — 


1 


I I 1 1 


I 1 


T""T" I — I — 1 — T— T — 1 — 1 — I - 1 




0 0 0 1 1 1 1 1 1 


1 1 


dst 




src (immediate value) 





Operation The 1 6-bit unsigned src immediate value is loaded into the 1 6 MSBs of the 
dst register. 0 is loaded into the 1 6 LSBs of the dst register. The dst register 
is assumed to be an integer. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 

OVM Operation is not affected by OVM bit value. 

LDHI 44h, R2 

Bgfprg Induction: 

R2 = ABCD EF12h 
After Instruction: 

R2 = 0044 OOOOh 



Mode Bit 
Example 



11 
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LDI Load Integer 



Syntax 

Operation 

Operands 



LDI src, dst 
src-* dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 


87 


0 


"i — r" 


1 1 1 1 I - 




I 1 1 1 


I I I I 


1 1" 1 1 1 1 1 




0 0 0 


0 1 0 0 0 0 


G 


dst 




src 





Description The src operand is loaded into the dst register. The dst and src operands 
are assumed to be signed integers. An alternate form of LDI, LDP, is used 
to load the data page pointer register (DP) or any other register with the eight 
MSBs of a relocatable address. See the LDP instruction in this chapter and 
subsection 11.3.2 (on page 11-15). 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 
UF 0. 
N 
Z 
V 

c 



1 if a negative result is generated, 0 otherwise. 
1 if a zero result is generated, 0 otherwise. 
0. 

Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Load Integer LDI 



Example ld I * -AR1 (IRO) , R5 

Before instruction; 

AR1 m 2Ch 
IRO = 5h 

R5 = 3C5h = 965 
Data at 27h = 26h = 38 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR1 = 2Ch 

IRO = 5h 

R5 = 26h = 38 

Data at 27h . 26h = 38 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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LDICOnd Load Integer Conditionally 



Syntax 
Operation 



Operands 



LDlcondsrc, dst 

If cond is true: 

src -> dst f 
Else: 

dst is unchanged. 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 

31 



24 23 



16 15 



87 



— i — r— i — 


" i i r r 




i i i i 


- i — r — i — i — r "i — i i i -j 


— 1 — 1 — I - T- 1 


0 10 1 


cond 


G 


dst 


src 





Description if the condition is true, the src operand is loaded into the dst register. Other- 
wise, the dst register is unchanged. The ofsf and srcoperands are assumed 
to be signed integers. 

The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 1 1 .2 on page 11-10 for a list of condition mnemonics, 
encoding, and flags). Note that an LDIU (load integer unconditionally) in- 
struction is useful for loading a selected CPU register without affecting the 
condition flags. 



Cycles 
Status Bits 



1 

LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Load Integer Conditionally LDICOnd 



Example ldiz R4,R6 

Before Instruction: 

R4 = 027Ch = 636 
R6 = 0FE2h = 4,066 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

R4 = 027Ch = 636 
R6 = 0FE2h = 4,066 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 
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LDII Load Integer, Interlocked 



Syntax 
Operation 

Operands 



Encoding 



LDII src, dst 

Signal interlocked operation. 
src -» dst 

src general addressing modes (G): 

0 1 direct 

1 0 indirect 
dst register (any register in CPU primary register file) 



31 


24 23 




16 15 




87 


0 


— 1 — 1 — 


— 1 — 1 — 1—1 — 1 — 




ill! 


i 


i — r — i — i 


i — i — i — i — i — r-T — i 


— r"T T™ 


0 0 0 


0 1 0 0 0 1 


Q 


dst 






src 





Description The src opera nd is lo ad ed into th e dst register. An interlocked operation is 
signaled over LOCK or LLOCK . The src and dst operands are assumed 
to be signed integers. Note that only the direct and indirect modes are al- 
lowed. Refer to Section 7.7 on page 7-39 for a detailed description. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example LDII @985Fh,R3 

Before Instruction: 

DP = 80 
R3= Oh 

Data at 80 985Fh = ODCh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

DP = 80 
R3= ODCH 

Data at 80 985Fh = ODCh 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Parallel LDI and LDI LDI||LDI 



Syntax LDI src2, dst2 

|| LDI srd, dstl 

Operation S rc2 -> dst2 
|| srd -> dstl 

Operands $rc1 indirect (disp = 0, 1 , IRO, IR1 ) 

dstl register (RO - R7) 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

d$t2 register (RO - R7) 



Encoding 



31 




24 23 




16 


15 




87 


0 


— f — 


— I — I — I — I — 


— I — 1 — 


—\ — 1 — 


— 1 — 1 — 


— i — r 


— i — i — i — r— 


l 


I I 


n — i — i — i — r™ 


1 1 


0 0 0 1 1 


dst2 


dstl 


0 0 0 




srd 






src2 



Description Two integer loads are performed in parallel. A warning is issued by the as- 
sembler if the LDIs load the same register. The result is that of LDI src2, dst2. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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LDI| I LDI Parallel LDI and LDI 



Example ld I * - ari ( l ) , R7 

II LDI *AR7++ (IRO) ,R1 

Before Instruction: 

AR1 - 80 9826h 
R7 = 0h 

AR7 - 80 98C8h 
IRO = 1 Oh 
R1 =0h 

Data at 80 9825h = OFAh = 250 
Data at 80 98C8h = 2EEh = 750 
LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR1 = 80 9826h 

R7 = OFAh = 250 

AR7 = 80 98D8h 

IRO = 1 Oh 

R1 - 02EEh = 750 

Data at 80 9825h = OFAh = 250 

Data at 80 98C8h = 2EEh = 750 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Parallel LDI and STI LDI | |STI 



Syntax 

Operation 

Operands 



Encoding 

31 



LDI src2, dst1 
|| STI src3, d$t2 

src2 -» dst 1 
|| src3->dst2 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst 1 register (RO - R7) 

src3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



24 23 



16 15 



87 



r 


i i i i 


i i 




-T"l 


l l l™ l \ l l 


1 1 




1 1 


0 110 1 


dst1 


0 0 0 


src3 


dst2 




src2 



Description An integer load and an integer store are performed in parallel. If src2 and 
dst2 point to the same location, src2 is read before the write to dst2. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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LDI | |STI Parallel LDI and STI 



Example ldi *-ari(1),r2 

II STI R7, *AR5++ (IRO) 

Before Instruction: 

AR1 = 80 98E7h 
R2 = Oh 
R7 = 35h = 53 
AR5 - 80 982Ch 
IRO = 8h 

Data at 80 98E6h = ODCh = 220 
Data at 80 982Ch = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 0 0 

After Instruction: 

AR1 . 80 98E7h 
R2 = ODCh = 220 
R7 = 35h - 53 
AR5 = 80 9834h 
IRO = 8h 

Data at 80 98E6h = ODCh = 220 

Data at 80 982Ch = 35h = 53 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Load Floating-Point Mantissa LDM 



Syntax 

Operation 

Operands 



LDM src, dst 

src (man) dst (man) 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dsfregister(R0-R11) 



Encoding 



31 




24 23 




16 15 


87 


0 


i i 

0 0 0 


1 1 

0 1 


i i i 

0 0 10 


— 1 — 

G 


i I I I 

dst 


i i i i i i i i i i i i i i i - 

src 



Description The mantissa field of the src operand is loaded into the mantissa field of the 
dst register. The dst exponent field is not modified. The src and dst 
operands are assumed to be floating-point numbers. If immediate 
addressing mode is used, bits 15 -12 of the instruction word are forced to 
0 by the assembler. If the source is in the memory, the 32-bit data are loaded 
into the mantisa field. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
Example ldm 156.75,R2 (156.75 = 07 1CC00000h) 

Before Instruction: 



R2 = 0h 

LUF LV UF N Z V C = 0 
After Instruction: 



0 0 0 0 0 0 



R2 = 00 1CC0 OOOOh = 1 .22460938e + 00 
LUF LV UF N Z V C = 0 0 0 0 0 



0 0 
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LDP Load Data Page Pointer 



Syntax LDP src[, DP] 
Operation S rc -> Data page pointer 

Operands src is the 1 6 MSBs of the absolute 32-bit source address (src). 

dst is optional (data page pointer understood if ",DP" left out of operand) 

Encoding 



31 


24 23 




16 15 


87 


0 


I 1 """! 

0 0 0 


i i ii i 

0 1 0 0 0 0 


" 1 

1 1 


l I l l 

1 0 0 0 0 


1 1 1 1 


— i — i — i — i't "i r 11 
src 





Description This pseudo-op is an alternate form of the LDI instruction, except that LDP 
is always in the immediate addressing mode (bits 22 - 21 = 11 2). The16 
MSBs of the src absolute 32-bit value (note that an src less than 32 bits will 
be zero filled to make the 32 bits) are loaded into the 1 6 LSBs of the data 
page pointer. (For example, an srcof any 1 6-bit value will result in 1 6 zeroes 
placed in the DP (the 16 extended zeroes used to fill the MSBs of the src 
value). 

The 1 6 LSBs of the pointer are used in direct addressing as a pointer to the 
page of data being addressed. There is a total of 256 pages, each page 64K 
words long. Bits 31 - 1 6 of the pointer are reserved and should be kept to 
zero. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 

OVM Operation is not affected by OVM bit value. 

LDP @809900h, DP 

or 

LDP @809900h 

Before Instruction: 

DP = 6465h 

LUFLVUFNZVC = 0 0 0 0 0 0 0 
After Instruction: 

DP = 0080h (16 MSbs of 32-bit src, zeroes extended) 

LUFLV UF N Z V C = 0 0 0 0 0 0 0 
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Mode Bit 
Example 



Load Integer From Primary Register File to Expansion Register File LDPE 



Syntax 

Operation 

Operands 

Encoding 



LDPE src, dst 
src-* dst 

src register mode (any register in CPU primary register file) 
dst expansion register file register (IVTP or TVTP) 



31 24 23 




1615 87 


0 


1 II 1 1—1 — 1 — 1 — 

0 11 10 110 1 


0 0 


i i i i 

dst 


i i i i i i i i i i 

00 000000 000 


src 



Description This is a means to load the IVTP register (interrupt-vector table pointer) or 
TVTP register (trap-vector table pointer). These registers are described in 
Section 3.2 on page 3-1 5. 

The scr operand register from the primary-register file is loaded into the dst 
register in the expansion register file. The dst operand is assumed to be an 
integer. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 
UF 
N 
Z 
V 

c 



Unaffected. 
Unaffected. 
Unaffected. 
Unaffected. 
Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 

Example LDPE AR, TVTP / set trap-vector pointer 



11 
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LDPK Load Data-Page Pointer Immediate 



Syntax LDPK src 

Operation src-> DP 

Operands src 16-bit unsigned immediate 

Encoding 



31 


24 23 


16 15 




87 


0 


— i — mr 


-1—1 — 1 — 1 — 1 TT— 


1 1 1 1 


1 1 


1 1 1 


i i i i i i 




0 0 0 


11 1110 11 


1 0 0 0 0 






src 





Description The 1 6-bit unsigned immediate value is loaded into the DP register. This 
operation is completed by the end of the decode phase of the LDPK instruc- 
tion; thus, the value loaded is ready for the next instruction for immediate 
addressing. 



Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Uhaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Load Half-Word LHw 



Syntax LHw src, dst 

Operation Sign-extended half-word (0, 1 ) of src -» dst 
w = half-word to load (0, 1) 

I 1 I 0 I = w designator 

Operands src register, direct, or indirect addressing modes 

dst register mode (any register in CPU primary register file) 

Encoding 

31 24 23 16 15 87 0 



i — m — i — i — i — i — 








i™i — i — r— 


t— i — i — i — i — r 


l l "l — l — T" 


10 1110 10 


H 


G 


dst 




src 





Instruction Word Fields 



G 


src addressing modes 


00 


register mode (Rn, 0 £ n s 31) 


01 


direct mode 


10 


indirect mode 




H 


src half-word 


0 


half-word 0 (LS half-word) 


1 


half-word 1 (MS half-word) 



Description The specified half-word of the src operand is sign-extended and right- 
shifted into the 16 LSBs of the dst register. The src half-word is signed. 

Cycles 1 
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LHw Load Half-Word 



Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example lho ri, R2 

Before Instruction: 

R1 = ABCD EF12h 
R2 = 1234 5678h 

After Instruction: 

R1 = ABCD EF12h 
R2 = FFFF FF12h 
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Load Half-Word Unsigned LHUw 



Syntax LHUw src. dst 
Operation Unsigned half-word (0, 1 ) of src -» dst 
w = half-word to load (0, 1) 

I 1 I 0 I = w designator 

Operands src register, direct, or indirect addressing modes 

dst register mode (any register in CPU primary register file) 

Encoding 

31 24 23 16 15 87 0 



III 1 1 1 1 






1 1 1 1 


'"i "i i — r-r-"i i i Ml i i i" 


|— i — l — |— 


10 1110 11 


H 


Q 


dst 


src 





Instruction Word Fields 



G 


src addressing modes 


00 


register mode (any CPU register) 


01 


direct mode 


10 


indirect mode 




H 


src half-word 


0 


half-word 0 (LS half-word) 


1 


half-word 1 (MS half-word) 



Description The specified half-word of the srcoperand is unsigned and right-shifted into 
the 1 6 LSBs of the dst register. The src half-word is unsigned. 

Cycles 1 



11 
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LHUw Load Half-Word Unsigned 



Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 
N 0. 

Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example lhuo ri, R2 

Before Instruction: 

R1 - ABCD EF12h 
R2 = 1234 5678h 

After Instruction: 

R1 = ABCD EF12h 
R2 = 0000 EF12h 
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Assembly Language Instructions 



Logical Shift LSH 



Syntax 
Operation 



Operands 



LSH count, dst 

If count >0: 

dst « count-* dst 
Else: 

dst» \count | -> cfsf 

dsf general addressing modes (G): 

0 0 register (any CPU register) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 

31 



24 23 



16 15 



87 



— 1 — 1— 


"i r 1 1 i 1 1 r 




I I 1 1 


i i i i i i i i i i 


i i i 1 1 


0 0 0 


0 10 0 11 


G 


dst 


count 





Description The seven least significant bits of the count operand are used to generate 
the twos-complement shift count. If the count operand is greater than zero, 
the dst operand is left-shifted by the value of the count operand. Low-order 
bits shifted in are zero-filled, and high-order bits are shifted out through the 
C (carry) bit. 

Logical left-shift: 

C<-cfcf<-0 

If the count operand is less than zero, the dst is right-shifted by the absolute 
value of the count operand. The high-order bits of the dst operand are zero- 
filled as they are shifted to the right. Low-order bits are shifted out through 
the C (carry) bit. 

Logical right-shift: 
0->cfef->C 

If the count operand is 0, no shift is performed, and the C (carry) bit is set 
to 0. The count operand is assumed to be a signed integer, and the dst oper- 
and is assumed to be an unsigned integer. 
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LSH Logical Shift 



Cycles 



1 



Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Set to the value of the last bit shifted out. 0 for a shift count of 0. 



Mode Bit 
Example 



Example 



OVM Operation is not affected by OVM bit value. 
LSH R4,R7 
Before Instruction: 



R4 = 018h = 24 
R7 = 02ACh 

LUF LV UF N Z V C = 0 

After Instruction: 

R4 = 018h = 24 
R7 = 0AC00 OOOOh 
LUF LV UF N Z V C = 0 

LSH *-AR5 (IR1) ,R5 

Before Instruction: 

AR5 = 80 9908h 
IRO = 4h 

R5 = 00 12C0 OOOOh 

Data at 80 9904h = 0FFF FFFF4h = -12 

LUF LV UF N Z V C = 0 0 0 0 

After Instruction: 

AR5 = 80 9908h 
IR0 = 4h 

R5 = 00 0001 2C00h 

Data at 80 9904h = 0FFF FFFF4h = -12 

LUF LV UF N Z V C = 0 0 0 0 



0 0 0 0 0 0 



0 0 10 10 



0 0 0 



0 0 0 
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Assembly Language Instructions 



Logical Shift, 3 Operands LSH3 



Syntax LSH3 count, src, dst 
Operation |f count > 0: 

src « count -> dst 
Else: 

src» \count \ -> dst 

Operands src, count both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 



Typel 



"i — i T"- i r i i i 




— 1 — I - 1 T 


1 r i i ,M "i i i i 


—i — i i — r i — r r 


0 0 1 0 0 1 0 0 0 


T 


dst 


src 


cnt 



Type 2 

31 



24 23 



16 15 



87 



"I 1 1 1 1 1 1 1 

0 0 1 1 0 1 0 0 0 


T 


r i" i- r 

dst 


src 


— i — i — i — n — i i 

cnt 



Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1, IRO, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp - 0, 1 , IRO, IR1) 


indirect mode (disp - 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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LSH3 Logical Shift, 3 Operands 



Description The seven least significant bits of the count operand are used to generate 
the twos-complement shift count 

If the count operand is greater than zero, the dst operand is left shifted by 
the value of the count operand. Low-order bits shifted in are zero-filled, and 
high-order bits are shifted out through the C (carry) bit. 

Logical left-shift: 




If the count operand is less than zero, the src operand is right shifted by the 
absolute value of the count operand. The high-order bits of the dst operand 
are zero-filled as shifted to the right. Low-order bits are shifted out through 
the C (carry) bit. 

Logical right-shift: 
0- 



msb 



src 



If the count operand is 0, no shift is performed and the C (carry) bit is set 
to 0. The count operand is assumed to be a signed integer. The src and dst 
operands are assumed to be unsigned integers. 

If count is greater than 32, the LSB ends up in the carry (C) bit. If count is 
less than -32, 0 ends up in the carry bit. This also applies to LSH. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SET COND) = i , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Set to the value of the last bit shifted out. 0 for a shift count oi 0. Unaf- 
fected if dst is not R0 - R7. 

Mode Bit OVM Operation is not affected by OVM bit value. 



11-128 



Assembly Language Instructions 



Parallel LSH3 and STI LSH3| |STI 



Syntax 



Operation 



Operands 



LSH3 count, src2, dst1 
|| STI src3, dst2 

If counts 0: 

src2 « count -> cfef 1 
Else: 

src2» \count | -» cfef 7 
|| src3-*dst2 

count register (R0 - R7) 

srcf indirect (disp = 0, 1 , IRO, IR1 ) 

dst1 register (R0 - R7) 

src2 register (RO - R7) 

dsn indirect (disp - 0, 1 , IRO, IR1 ) 



Encoding 



31 




24 23 




16 15 




87 




0 


1 

1 1 


1 1 1 1 

0 1110 


I 1 

dstl 


i i 

count 


i i 

src3 


i i i i i i i 

dst2 


i i i i i i I 

src2 



Description The seven least significant bits of the count operand are used to generate 
the twos-complement shift count. 

If the count operand is greater than zero, the dst operand is left shifted by 
the value of the count operand. Low-order bits shifted in are zero-f illed, and 
high-order bits are shifted out through the C (carry) bit. 

Logical left-shift: 

C<r-dSt2<r-0 

If the count operand is less than zero, the ofsf operand is right shifted by the 
absolute value of the count operand. The high-order bits of the dst operand 
are zero-filled as shifted to the right. Low-order bits are shifted out through 
the C (carry bit). 

Logical right-shift: 

0->dsf2-»C 

If the count operand is 0, no shift is performed and the carry bit is set to 0. 

The count operand is assumed to be a 7-bit signed integer, and the src2 and 
dstl operands are assumed to be unsigned integers. All registers are read 
at the beginning and loaded at the end of the execute cycle. This means that 
if one of the parallel operations (STI) reads from a register and the operation 
being performed in parallel (LSH3) writes to the same register, then STI ac- 
cepts as input the contents of the register before it is modified by the LSH3. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 
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LSH3| |STI Parallel LSH3and STI 



Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Set to the value of the last bit shifted out. 0 for a shift count of 0. 

Mode Bit OVM Operation is affected by OVM bit value. 

Example LSH3 R2, *++AR3 (1) ,R0 

II STI R4,*-AR5 

Before Instruction: 

R2= 18h = 24 
AR3 = 8098C2h 
R0 = 0h 

R4 = ODCh - 220 
AR5 = 80 98A3h 
Data at 80 98C3h = OACh 
Data at 80 98A2h = Oh 
LUF LV UF N Z V C = 0 



0 0 0 0 0 0 



After Instruction: 



R2 = 18h = 24 

AR3 = 8098C3h 

R0 = 0AC00 OOOOh 

R4 = ODCh = 220 

AR5 = 80 98A3h 

Data at 80 98C3h = OACh 

Data at 80 98A2h = ODCh = 220 

LUF LV UF N Z V C = 0 0 0 1 



0 1 0 
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Assembly Language Instructions 



Parallel LSH3 and STI LSH3| |STI 



Example LSH3R7 , *AR2 ( 1 ) , R2 

II STI R0,*+AR0(1) 

Before Instruction: 

R7 = OFFFFF FF4h = -12 
AR2 = 80 9863h 
R2 = 0h 

R0 = 12Ch = 300 

ARO = 80 98B7h 

Data at 80 9863h = 2C00 OOOOh 

Data at 80 98B8h . Oh 

LUF LV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

R7 = OFFFFF FF4h = -12 
AR2 = 80 9862h 
R2 = 2C000h 
R0 = 12Ch = 300 
ARO = 80 98B7h 
Data at 80 9863h = 2C00 OOOOh 
Data at 80 98B8h = 1 2Ch = 300 
LUFLV UF N Z V C = 0 0 0 0 0 00 
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LWLct Load Word Left-Shifted 



Syntax LWLct src, dst 

Operation S rc « {0, 1 , 2, or 3} bytes and merged with dst -> dst 

Operands ct the count of bytes {0, 1 , 2, or 3} to shift left (ct x 8 = shift in bits) 
src register, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 



31 


24 23 




16 15 


87 


0 


"1 TT 1 — 1 — 1 — 

10 1 10 10 


i 

B 


Q 


1 1 1 1 

dst 


I I I I I I I i i i i i i i i 

src 



Instruction Word Fields 



G 


src addressing modes 


00 


register mode (any CPU register) 


01 


direct mode 


10 


indirect mode 




B 


src byte 


00 


no shift 


01 


shift left 1 byte space 


10 


shift left 2 byte spaces 


11 


shift left 3 byte spaces 



Description The src operand is left shifted the specified number of bytes and merged 
with the bytes of the dst register that are below the left-shifted LSB of the 
src register. 

Cycles 1 
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Assembly Language Instructions 



Load Word Left-Shifted LWLct 



Status Bits If ST (SET COND) « 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example lwl2 ri, R2 

Before Instruction; 

R1 ■ ABCD EF12h 
R2 = 1234 5678h 

After Instruction: 

R1 = ABCD EF12h (remains unchanged) 
EF12 OOOOh (left shifted interim value) 

R2= EF12 5678h (contents merged) 



11 
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LWRct Load Word Right-Shifted 



Syntax LWRct src, dst 

Operation S rc » {0, 1 , 2, or 3} bytes and merged with dst -» dst 

Operands ct the count of bytes {0, 1 , 2, or 3} to shift right (ct x 8 = shift in bits) 
src register, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 

31 24 23 16 15 87 0 



"i i i— n — r 






I 1 1 I 


- T"T 1 — 1 — T— T"l 1 1 1 


i i i i i 


10 110 11 


B 


G 


dst 


src 





Instruction Word Fields 



G 


src addressing modes 


00 


register mode (any CPU register) 


01 


direct mode 


10 


indirect mode 




B 


src byte 


00 


no shift 


01 


shift right 1 byte space 


10 


shift right 2 byte spaces 


11 


shift right 3 byte spaces 



Description The src operand is right shifted the specified number of bytes and merged 
with the bytes of the dst register that are above the right-shifted MSB of the 
src register. Sign is not extended. 

Cycles 1 
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Assembly Language Instructions 



Load Word Right-Shifted LWRct 



Status Bits if ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 

V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example lwri ari, R2 

Before Instruction: 

AR1 = ABCD EF12h 
R2 = 1234 5678h 

After instruction: 

AR1 = ABCD EF12h (remains unchanged) 

00AB CDEFh (right-shifted interim value) 

R2 = 12AB CDEFh (contents merged) 



11 
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MBct Merge Byte, Left-Shifted 



Syntax MBct src, dst 

Operation q LSBs of src « {0, 1 , 2, or 3} bytes and merged with dst -> dst 

Operands ct the count of bytes {0, 1 , 2, 3} to shift left (ct x 8 = shift in bits) 
src register, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 



— i — n — i—i — i — 


— 1 — 


— r 


— 1 — r - T T 


— i — i — i — i — i — i — i — i — i — i — i — i — m — r— 


10 1110 0 


B 


G 


dst 


src 



Instruction Word Fields 



G 


src addressing modes 


00 


register mode (any CPU register) 


01 


direct mode 


10 


indirect mode 




B 


src byte 


00 


no shift 


01 


shift left 1 byte space 


10 


shift left 2 byte spaces 


11 


shift left 3 byte spaces 



Description The 8 LSBs of the srcoperand are left shifted (0, 1 , 2, or 3) bytes and merged 
with the dst register. 

Cycles 1 
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Assembly Language Instructions 



Merge Byte, Left-Shifted MBct 



Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example mb2 ari, ar2 

Before Instruction: 

AR1 = ABCD EF12h 
AR2= 1234 5678h 

After Instruction: 

AR1 = ABCD EF12h (remains unchanged) 

00AB CDEFh (left-shifted interim value) 

AR2 = 1212 5678h (contents merged) 



11 
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MHct Merge Half-Word, Left-Shifted 



Syntax MHct src, dst 

Operation 1 6 LSBs of src « {0, 1 } half-words merged with dst -» dst 

Operands ct the count of half-word (1 6-bit) shifts 

src register, direct, or indirect addressing modes 

dst register mode (any register in CPU primary register file) 



Encoding 

31 24 23 16 15 87 0 



III 1 1 1 1 






i i i i 


1 1 1 1 1 1 1 1 1 1 




10 11110 0 


H 


G 


dst 


src 





Instruction Word Fields 



G 


src addressing modes 


00 


register mode (any CPU register) 


01 


direct mode 


10 


indirect mode 




H 


src byte 


00 


no shift 


01 


shift left 1 half-word (1 6 bits) 



Description The 1 6 LSBs of the srcoperand are left shifted (0, 1 ) half-words and merged 
with the dst register. 

Cycles 1 
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Assembly Language Instructions 



Merge Half-Word, Left-Shifted MHct 



Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 

V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example mhi ari, AR2 

Before Instruction: 

AR1 = ABCD EF12h 
AR2= 1234 5678h 

After Instruction: 

AR1 = ABCD EF12h (remains unchanged) 
EF12 OOOOh (left-shifted interim value) 

AR2= EF12 5678h (contents merged) 



11 
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MPYF Multiply Floating-Point Values 



Syntax 

Operation 

Operands 



MPYF src, dst 
dst x src dst 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

I 0 indirect 

II immediate 

dst register (R0-R11) 



Encoding 



31 


24 23 




16 15 


87 


0 




i i — r-i — i— 


— p..,., 


1 l l 1 


i i i i 


— 1 — 1 — 1 — l—'l 1 1 1 


1 1 1 111 


0 0 1 


0 1 0 10 0 


Q 


dst 




src 





Description The product of the dst and src operands is loaded into the dst register. The 
src operand is assumed to be a single-precision floating-point number, and 
the ofsf operand is an extended-precision floating-point number. 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if a floating-point is overflow occurs, 0 otherwise. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example mpyf R0,R2 

Before Instruction: 

R0 - 07 0C80 OOOOh = 1 .4050e + 02 

R2 = 03 4C20 OOOOh = 1 .275781 25e + 01 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 

After Instruction: 

R0 = 07 0C80 OOOOh = 1 .4050e + 02 
R2 = OA 600F 2000h = 1 .79247266e + 03 
LUFLV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Multiply Floating-Point Values, 3 Operands MPYF3 



Syntax MPYF3 src2, srrf, dst 
Operation S rc1 x src2 -> dst 

Operands srrf, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (RO - R1 1 ) 

Encoding 
Typel 

31 24 23 16 15 87 0 



1 — 1 — I — I — 1 — I — 1 1 


1 


— 1 — 1 — 1 — 1 


— i — i — i — i — r— 


1 — 1 — 


i 


i r-""i "T i — i 


0 0 1 0 0 1 0 0 1 


T 


dst 


srrf 






src2 



Type 2 

31 24 23 16 15 87 0 



1 — n — 1 — 1 — 1 — 1 — 1 — 






""■ r i T—r-T 


T I — 


- r~f -t — i — r 


— n— 


0 0 1 1 0 1 0 0 1 


T 


dst 


srrf 




src2 





Instruction Word Fields 



T 


srrf addressing modes 


src2 addressing modes 


00 


register mode (R0 — R1 1 ) 


register mode (R0 — R1 1 ) 


01 


indirect mode (disp = 0, 1 , IRO, IR1) 


register mode (any CPU register) 


10 


register mode (RO — R1 1 ) 


indirect mode (disp « 0, 1 , IRO, IR1 ) 


11 


indirect mode (disp = 0, 1, IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 



T 


srrf addressing modes 


src2 addressing modes 


01 


register mode (RO — R11) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 



Description The product of srrf, and src2, is loaded into the dst register. The values at 
srrf, src2, and cfef are extended-precision floating-point numbers. 
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MPYF3 Multiply Floating Point Values, 3 Operands 



Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if a floating-point is overflow occurs, 0 otherwise. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 



1 1 . 1 42 Assembly Language Instructions 



Parallel MPYF3 andADDF3 MPYF3| | ADDF3 



Syntax 

Operation 

Operands 



MPYF3 srcA, srcB, dst1 
ADDF3 srcC, srcD, dst2 



srcA x srcB - 
srcC + srcD • 



dst1 
>dst2 



srcA 
srcB 
srcC 
srcD 



dst1 



dst2 



srd 
src2 
src3 
src4 



y Any two must be indirect (disp = 0 f 1 , IRO, IR1), and 
any two must be register (RO - R7) 



register (d1): 

0 = R0 

1 = R1 



register (d2): 

0 = R2 

1 =R3 

register (RO - R7) 

register (RO - R7) 

indirect (disp = 0, 1 , IRO, IR1 ) 

indirect (disp = 0, 1 , IRO, IR1 ) 

parallel addressing modes (0 < P < 3) 

Operation (P Field) 



00 
01 
10 
11 



Encoding 



-~i — i — i— 

0 0 0 0 



24 23 



T 



src3 x src4, srd + src2 
src3 x srd, src4 + src2 
srd x src2, $rc3 -f src4 
src3 x srd, src2 + src4 

16 15 



i — i — r 

src3 



"i — r 



87 



"i — r 



1 o 



d1 



62 



srd 



src2 



src4 



Description A floating-point multiplication and a floating-point addition are performed in 
parallel. All registers are read at the beginning and loaded at the end of the 
execute cycle. This means that if one of the parallel operations (MPYF3.) 
reads from a register and the operation being performed in parallel (ADDF3) 
writes to the same register, then MPYF3 accepts as input the contents of 
the register before it is modified by the ADDF3. 
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MPYF3||ADDF3 Parallel MPYF3 and ADDF3 



Cycles 
Status Bits 



Any combination of addressing modes may be coded for the four possible 
source operands as long as two are coded as indirect and two are register. 
The assignment of the source operands sra4- srcD\o\he srrf -src4fields 
varies, depending on the combination of addressing modes used ; the P field 
is encoded accordingly. The assembler may, when not significant, change 
the order of operands in commutative operations in order to simplify pro- 
cessing. 

If src2 and dst2 point to the same location, $rc2 is read before the write to 
dst2. 

1 

LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 0. 

Z 0. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Parallel MPYF3 andADDF3 MPYF3| | ADDF3 



Example MPYKR5++ ( 1 ) , * AR1 ( IRO ) , RO 

II ADDF3 R5,R7,R3 

Before Instruction: 

AR5 = 80 98C5h 
AR1 = 80 98A8h 
IRO = 4h 
R0 = 0h 

R5 = 07 33C0 OOOOh = 1 .79750e + 02 
R7 = 07 0C80 OOOOh = 1 .4050e + 02 
R3 = 0h 

Data at 80 98C5h « 34C OOOOh = 1 .2750e + 01 
Data at 80 98A4h = 111 OOOOh = 2.2500e + 00 
LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR5 - 80 98C6h 
AR1 = 80 98A4h 
IRO = 4h 

RO = 04 6718 OOOOh = 2.888671 88e + 01 

R5 = 07 33C0 OOOOh = 1 .79750e + 02 

R7 = 07 0C80 OOOOh = 1 .4050e + 02 

R3 = 08 2020 OOOOh = 3.20250e + 02 

Data at 80 98C5h . 34C OOOOh = 1 .2750e + 01 

Data at 80 98A4h = 111 OOOOh = 2.2500e + 00 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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MPYF3||STF Parallel MPYF3 and STF 



Syntax 

Operation 

Operands 



MPYF3 src2, srrt, dst 
STF src3, dst2 



srd x src2- 
src3 dst2 



dst1 



$rd register (RO - R7) 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst 1 register (RO - R7) 

$rc3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



Encoding 

31 



24 23 



16 15 



87 



~l — 


I T T~l — 


— I — I — 


~\ 1 — 


i i 


i i i i i i i 


" '1 I \ 1 1~l T 


1 1 


0 1111 


dst1 


srd 


src3 


dst2 


src2 



Description A floating-point multiplication and a floating-point store are performed in 
parallel. All registers are read at the beginning and loaded at the end of the 
execute cycle. This means that if one of the parallel operations (MPYF3) 
writes to a register and the operation being performed in parallel (STF) 
reads from the same register, then the STF accepts as input the contents 
of the register before it is modified by the MPYF3. 

If src2 and dst2 point to the same location, then src2 is read before the write 
to dst2. 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, 0 unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Parallel MPYF3 and STF MPYF3||STF 



Example MPYF3 *-AR2 ( 1 ) , R7 , RO 
II STFR3,*AR0 (IRO) 

Before Instructton: 

AR2 = 80 982Bh 

R7 = 05 7B40 OOOOh = 6.281 250e + 01 
RO = Oh 

R3 = 08 6B28 OOOOh = 4.7031 250e + 02 
ARO = 80 9860h 
IRO - 8h 

Data at 80 982Ah = 70C8000h = 1 .4050e + 02 
Data at 80 9860h = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 

AR2 = 80 982Bh 

R7 = 05 7B40 OOOOh = 6.281 250e + 01 
RO = OD 09E4 AOOOh = 8.8251 5625e + 03 
R3 = 08 6B28 OOOOh = 4.7031 250e + 02 
ARO = 80 9858h 
IRO = 8h 

Data at 80 982Ah = 70C 8000h = 1 .4050e + 02 
Data at 80 9860h = 86B28 OOOOh = 4.7031 250e + 02 
LUF LV UF N Z V C = 0 0 0 0 0 00 



11 
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MPYF3||SUBF3 Parallel MPYF3 and SUBF3 



Syntax 
Operands 



Operation 



Encoding 

31 



MPYF3 srcA, srcB, dst1 
SUBF3 srcC, srcD, d$t2 



srcA 
srcB 
srcC 
srcD 



Any two must be indirect (disp = 0, 1 , IRO, IR1), and 
any two must be register (RO - R7) 



srcA x srcB - 
srcD-srcC- 



d$t1 
>dst2 



dst1 



register (d1): 

0 = R0 

1 =R1 



d$t2 register (d2): 

0 = R2 

1 =R3 



srd 
$rc2 
src3 
src4 



register (RO - R7) 

register (RO - R7) 

indirect (disp = 0, 1, IRO, IR1) 

indirect (disp = 0, 1, IRO, IR1) 

paratlel addressing modes (0 < P < 3) 

Operation (P Field) 



00 
01 
10 
11 



$rc3 x src4, srd - src2 
src3 x srtf, src4 - src2 
srd x src2, src3 - src4 
src3 x srd, src2 - src4 



24 23 



16 15 



87 



"T 


1 — 1 — 1 — 








— 1 — 1 — 


" » T — 


— r- i i "i — r t-i 


1 I 1 1 1 1 1 


1 0 


0 0 0 1 


p 


d1 


62 


srd 


src2 


src3 


src4 



Description A floating-point multiplication and a floating-point subtraction are performed 
in parallel. All registers are read at the beginning and loaded at the end of 
the execute cycle. This means that if one of the paratlel operations (MPYF3) 
reads from a register, and the operation being performed in parallel 
(SUBF3) writes to the same register, then MPYF3 accepts as input the con- 
tents of the register before it is modified by the SUBF3. 
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Assembly Language Instructions 



Parallel MPYF3 and SUBF3 MPYF3HSUBF3 

Any combination of addressing modes may be coded for the four possible 
source operands as long as two are coded as indirect and two are register. 
The assignment of the source operands srcA -srcDtolhesrd -src4fields 
varies, depending on the combination of addressing modes used; the P field 
is encoded accordingly. The assembler may, when not significant, change 
the order of operands in commutative operations in order to simplify pro- 
cessing. 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 0. 

Z 0. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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MPYF3| |SUBF3 Parallel MPYF3 and SUBF3 



Example mpyf3 R5, *++AR7 (iri) ,ro 

II SUBF3R7,*AR3 (1),R2 

or 

MPYF3 *++AR7(IRl), R5,R0 
II SUBF3R7,*AR3 (1),R2 

Before Instruction: 

R5 - 03 4C00 OOOOh = 1 .2750e + 01 
AR7 = 80 9904h 
IR1 = 8h 
R0 = 0h 

R7 = 07 33C0 OOOOh - 1 .79750e + 02 
AR3 = 80 98B2h 
R2 = 0h 

Data at 80 990Ch = 111 OOOOh = 2.250e + 00 
Data at 80 98B2h = 70C 8000h . 1 .4050e + 02 
LUFLV UF N Z V C = 0 0 0 0 0 0 0 

After Instruction: 

R5 = 03 4C00 OOOOh = 1 .2750e + 01 
AR7 = 80 990Ch 
IR1 . 8h 

RO . 04 6718 OOOOh = 2.888671 88e + 01 
R7 = 07 33C0 OOOOh = 1 .79750e + 02 
AR3 = 80 98B1h 

R2 = 05 E300 OOOOh = - 3.9250e + 01 
Data at 80 990Ch = 111 OOOOh = 2.250e + 00 
Data at 80 98B2h = 70C 8000h = 1 .4050e + 02 
LUFLV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Multiply Integer MPYI 



Syntax MPYI src, dst 
Operation dst x src -> dst 

Operands src general addressing modes (G): 

0 0 register (any CPU register) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 

Encoding 



31 


24 23 




16 


15 


87 


0 


1 — 1 — 

0 0 0 


1— 1 — I — 1 — I - 

0 1 0 10 1 


— r ■"■ 

Q 


r r i — i 1 """" 
dst 


i i i i 


src 


1 — I — I — 1— 



Description The product of the dst and src operands is loaded into the dst register. The 
srcand cfef operands, when read, are assumed to be 32-bit signed integers. 
The result is assumed to be a 64-bit signed integer. The output to the dst 
register is the 32 least-significant bits of the result. 

Integer overflow occurs when any of the most significant 32 bits of the 64-bit 
result differs from the most significant bit of the 32-bit output value. 
Cycles 1 

Status Bits if ST (SET COND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unchanged. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C Unaffected. 

OVM Operation is affected by OVM bit value. 

MPYI R1,R5 

Before Instruction: 

R1 = 00 0033 C251 h = 3,392,081 
R5 = 00 0078 B600h = 7,91 0,91 2 
LUF LV UF N Z V C = 0 0 0 0 0 0 0 
After Instruction: 

R1 = 00 0033 C251 h = 3,392,081 
R5 = 00 E21 D 9600h =- 501 ,377,536 
LUF LV UF N Z V C = 0 1 0 10 10 
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Mode Bit 
Example 



MPYI3 Multiply Integer, 3 Operands 



Syntax 

Operation 

Operands 

Encoding 
Typel 

31 



MPYI3 src2, srrf, dst 
$rd x src2-> dst 



srrt, src2 
dst 



24 23 



both type 1 or type 2 three-operand addressing modes 
register mode (any register in CPU primary register file) 



16 15 



87 



1 1 1 — 1 — I — 1 — I — I™ 


— I — 


— I — I — 1 — 1 — 


n — i — i — i — i — r-'i — 


t ri t— i — r~r- 


0 0 1 0 0 1 0 1 0 


T 


dst 


srd 


src2 



Type 2 

31 24 23 16 15 87 _0 



' T — IT — 1 — 1 — 1 — 1 — 1 — 


1 1 


T — l"-T 1 T 


— I — l — 


r m r i" i T" 


i i i i i i i 


0 0 1 10 10 10 


T 


dst 




srd 


src2 



Instruction Word Fields 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1 , IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp - 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRQ, IR1) 


indirect mode (disp = 0, 1, IRO, IR1) 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 



Description The product of the numbers at srrf and src2 is loaded into the dst register. 

The multiplied numbers are assumed to be 32-bit signed integers. The re- 
sult is assumed to be a signed 64-bit integer. The output to the dst register 
is the 32 least significant bits of the result. 



Cycles 1 
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Assembly Language Instructions 



Multiply Integer, 3 Operands MPYI3 

Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unchanged. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C Unaffected. 

Mode Bit OVM Operation is affected by OVM bit value. 

Note Integer overflow occurs when any of the most significant 32 bits of the 64-bit 

result differs from the most significant bit of the 32-bit output value. 



11 
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MPYI3||ADDI3 Parallel MPYI3 and ADD 13 



Syntax 

Operation 

Operands 



MPYI3 srcA, srcB, dst1 
ADDI3 srcCsrcD, dst2 



srcA x srcB - 
srcD + srcC 



dst1 
>dst2 



srcA 
srcB 
srcC 
srcD 



d$t1 



dst2 



srd 
src2 
src3 
src4 



Any two must be indirect (disp = 0, 1, IRO, IR1), and 
any two must be register (RO - R7) 



register {d1): 

0 = R0 

1 =R1 

register (d2): 

0 = R2 

1 =R3 

register (R0-R7) 
register (RO - R7) 
indirect (disp = 0, 1, IRO, IR1) 
indirect (disp = 0, 1, IRO, IR1) 

parallel addressing modes (0 < P < 3) 

Operation (P Field) 



00 
01 
10 
11 



Encoding 

31 



1 — 

1 0 



— ! — I — I — 

0 0 10 



T 



24 23 



i — r 

srd 



src3 x src4, srd + src2 
src3 x srd, src4 + src2 
srd x src2, src3 + src4 
src3 x srd, $rc2 + src4 

16 15 



i — r 

src2 



i — \ — i — i — r 

src3 



87 



i — i — r 



d1 



62 



src4 



Description An integer multiplication and an integer addition are performed in parallel. 

All registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (MPYI3) reads from 
a register and the operation being performed in parallel (ADDI3) writes to 
the same register, then MPYI3 accepts as input the contents of the register 
before it is modified by the ADDI3. 
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Assembly Language Instructions 



Parallel MPYI3 andADDI3 MPYI3| | ADDI3 



Cycles 
Status Bits 



Mode Bit 
Example 



Any combination of addressing modes may be coded for the four possible 
source operands as long as two are coded as indirect and two are register. 
The assignment ofthe source operands srcA -srcDto the srd -src4fields 
varies, depending on the combination of addressing modes used ; the P field 
is encoded accordingly. The assembler may, when not significant, change 
the order of operands in commutative operations in order to simplify pro- 
cessing. 

1 

LUF Unchanged. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
0. 



UF 
N 
Z 
V 

c 



0. 
0. 

1 if an integer overflow occurs, 0 otherwise. 
Unaffected. 



OVM Operation is affected by OVM bit value. 

MPYI3 R7,R4,R0 
II ADDI3*-AR3,*AR5 (1),R3 

Before Instruction: 

R7=14h = 20 
R4 = 64h = 100 
R0 = 0h 

AR3 = 80 981 Fh 
AR5 = 80 996Eh 
R3 = 0h 

Data at 80 981 Eh = 0FFFF FFCBh = - 53 

Data at 80 996Eh = 35h = 53 

LUF LV UF N Z V C = 0 0 0 0 0 

After Instruction: 

R7= 14h = 20 
R4 = 64h = 100 
R0 = 07D0h = 2000 
AR3 = 80 981 Fh 
AR5 = 80 996Dh 
R3 = 0h 

Data at 80 981 Eh = 0FFFF FFCBh = - 53 

Data at 80 996Eh = 35h = 53 

LUF LV UF N Z V C = 0 0 0 0 0 



0 0 



0 0 
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MPYI3| |STI Parallel MPYISand STI3 



Syntax 

Operation 

Operands 



MPYI3 src2, srcl, dsti 
STI src3, dst2 



srd x src2- 
src3 -» dst2 



dsti 



srd register (RO - R7) 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dsti register (RO - R7) 

src3 register (RO - R7) 

dst2 indirect (disp = 0, 1, IRO, IR1) 



Encoding 

31 



24 23 



16 15 



87 





— 1 — 1 — 1 1 


1 1 


i i 


— I ""i — 


— 1 — I I — I — 1 — 1" 1 


— I — I — I — I — I — 1 — I — 


1 1 


1 0 0 0 0 


dsti 


srd 


src3 


dst2 


src2 



Description An integer multiplication and an integer store are performed in parallel. All 
registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STI) reads from a 
register and the operation being performed in parallel (MPYI3) writes to the 
same register, then STI accepts as input the contents of the register before 
it is modified by the MPYI3. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

Integer overflow occurs when any of the most significant 1 6 bits of the 48-bit 
result differs from the most significant bit of the 32-bit output value. 

Cycles 1 

Status Bits LUF Unchanged. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is affected by OVM bit value. 
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Assembly Language Instructions 



Parallel MPYI3 and STI3 MPYI3||STI 



Example MPYI 3 * ++ARO ( 1 ) , R5 , R7 

II STI R2, *-AR3 (1) 

Before Instruction: 

ARO = 80 995Ah 
R5 = 32h = 50 
R7 = 0h 

R2 = ODCh m 220 

AR3 = 80 982Fh 

Data at 80 995Bh = 0C8h = 200 

Data at 80 982Eh = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

ARO - 80 995Bh 

R5 = 32h = 50 

R7 = 2710h = 10000 

R2 = ODCh = 220 

AR3 = 80 982Fh 

Data at 80 995Bh = 0C8h = 200 

Data at 80 982Eh = ODCh = 220 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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MPYI3||SUBI3 Parallel MPYI3 and SUBI3 



Syntax 



Operation 



Operands 



Encoding 

31 



MPYI3 srcA, srcB, dst1 
SUBI3 srcC, srcD, dst2 



srcA x srcB - 
srcD-srcC- 



d$t1 
>dst2 



srcA 
srcB 
srcC 
srcD 



dst1 



dst2 



srd 
src2 
src3 
src4 



Any two must be indirect (disp = 0, 1 , IRO, IR1), and 
any two must be register (RO - R7) 



register (d1): 

0 = R0 

1 = R1 

register {d2): 

0 = R2 

1 =R3 

register (RO - R7) 
register (RO - R7) 
indirect (disp = 0, 1, IRO, IR1) 
indirect (disp = 0, 1 , IRO, IR1 ) 

parallel addressing modes (0 < P < 3) 

Operation (P Field) 



00 
01 
10 
11 



T 



24 23 



1 — r 

srd 



src3 x src4, srd - src2 
src3 x srd, src4 - src2 
srd x src2, src3 - src4 
src3 x srd, src2 - src4 

16 15 



87 



1 0 



0 0 11 



d1 



d2 



$rc2 



I I r 

$rc3 



src4 



Description An integer multiplication and an integer subtraction are performed in paral- 
lel. All registers are read at the beginning and loaded at the end of the ex- 
ecute cycle. This means that if one of the parallel operations (MPYI3) reads 
from a register and the operation being performed in parallel (SUBI3) writes 
to the same register, then MPYI3 accepts as input the contents of the regis- 
ter before it is modified by the SUBI3. 
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Assembly Language Instructions 



Parallel MPYI3 and SUBI3 MPYI3| |SUBI3 



Cycles 
Status Bits 



Any combination of addressing modes may be coded for the four possible 
source operands as long as two are coded as indirect and two are register. 
The assignment of the source operands srcA -srcDto the srd -src4fields 
varies, depending on the combination of addressing modes used; the P field 
is encoded accordingly. The assembler may, when not significant, change 
the order of operands in commutative operations in order to simplify pro- 
cessing. 

Integer overflow occurs when any of the most significant 1 6 bits of the 48-bit 
result differs from the most significant bit of the 32-bit output value. 

1 

LUF Unchanged. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 1 if an integer underflow occurs, 0 otherwise. 

N 0. 

Z 0. 

V 1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is affected by OVM bit value. 



11 
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MPYI3| |SUBI3 Parallel MPYI3 and SUBI3 



Example MP YI 3 R2 , * ++ARO ( 1 ) , RO 

II SUBI3AR5 (IR1),R4,R2 

or 

MPYI3 *++AR0 (1) ,R2,R0 
II SUBI3AR5 (IR1),R4,R2 

Before Instruction: 

R2 = 32h = 50 
ARO = 80 98E3h 
R0 = 0h 

AR5 m 80 99FCh 

IR1 = OCh 

R4 = 07D0h = 2000 

Data at 80 98E4h = 62h = 98 

Data at 80 99FCh = 4B0h = 1200 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

R2 = 320h . 800 

ARO - 80 98E4h 

R0 = 01324h = 4900 

AR5 = 80 99F0h 

IR1 . OCh 

R4 = 07D0h = 2000 

Data at 80 98E4h = 62h = 98 

Data at 80 99FCh = 4B0h = 1 200 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Multiply Signed Integer and Produce 32 MSBs MPYSHI 



Syntax MPYSHI src, dst 

Operation dst x src -> dst 

Operands src general addressing modes 

dst register mode (any register in CPU primary register file) 

Encoding 



31 


24 23 




16 


15 


87 


0 


\ 1 1—1 — 1 

0 0 0 1 1 


■ r" i "i 

10 11 


G 


dst 


l l l I I l l l I I I l l l l 

src 



Instruction Word Fields 



G 


src addressing modes 


00 


register mode (any CPU register) 


01 


direct mode 


10 


indirect mode 


11 


immediate mode 



Description The 32 MSBs of the product of the numbers at dst and src are loaded into 
the dst register. These numbers, when read, are assumed to be signed 
32-bit integers. The result is assumed to be a signed 64-bit integer. The 
output to the dst register is the 32 most significant bits of the result. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unchanged. 
LV Unchanged. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if all 64 bits of the product are 0, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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MPYSHI3 Multiply Signed Integer Producing 32 MSBs, 3 Operands 

Syntax MPYSHI3 src2, srd, dst 
Operation S rd x src2 -» dst 

Operands srd type 1 or type 2 three-operand addressing modes 
src2 type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 
Type 1 



31 






24 23 




16 15 




87 




0 


1 1 

0 0 1 


1 1 1 

0 1 0 


0 


1 

0 1 


» 

T 


l l l l 

dst 


l l I I l I I 

srd 


l l l I I I I 

src2 


Type 2 

31 






24 23 




16 15 




87 




0 


1 1 1 

0 0 1 


1 1 1 

1 1 0 


I — I 

0 


1 

0 1 


I 

T 


i i i i 

dst 


1 B I 1 1 1 I 

srd 


i i i i i i i 

src2 



Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1 , IRO, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 



Description The product of the numbers at the srd and src2 operands is loaded into the 
dst register. The numbers at the srd and src2 operands are assumed to 
be 32-bit signed integers. The result is assumed to be a signed 64-bit inte- 
ger. The output to the dst register is the 32 most significant bits of the result. 
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Assembly Language Instructions 



Multiply Signed Integer Producing 32 MSBs, 3 Operands MPYSHI3 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unchanged. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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11 



MPYUHI Multiply Unsigned l^eger and Produce 32 MSBs 

Syntax MPYUHI src, dst 

Operation dst x src -> dst 

Operands src general addressing modes 

dst register mode (any register in CPU primary register file) 

Encoding 



- 1 — m — i — i — i — i — j— 


"1 " 


— 1 


i — i — r "- 


— i — i — n — r 


~i — r — r rn — i — i — i — r— 


0 0 0 1 1 1 1 0 0 


G 




dst 




src 



Instruction Word Fields 



G 


src addressing modes 


00 


register mode (any CPU register) 


01 


direct mode 


10 


indirect mode 


11 


immediate mode 



Description The 32 MSBs of the product of the numbers at dst and src operands are 
loaded into the dst register. These numbers, when read, are assumed to 
be unsigned 32-bit integers. The result is assumed to be an unsigned 64-bit 
integer. The output to the dst register is the 32 most significant bits of the 
result. 

Cycles 1 

Status Bits If ST (SET COND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unchanged. 
LV Unchanged. 
UF 0. 
N 0. 

Z 1 if all 64 bits of the product are 0, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Multiply Unsigned Integer Producing 32 MSBs, 3 Operands MPYUHI3 



Syntax 

Operation 

Operands 

Encoding 
lypel 



MPYUHI3 src2, srtf, dst 
srd x src2-> dst 

srd, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 



31 24 23 




16 15 




87 




0 


0 0 1 0 1 0 0 1 0 


T 


l I l I 

dst 


i i I I I I I 

srd 


i i i I I i I 

src2 



Type 2 

31 24 23 16 15 87 0 



— i — rn — i — i — i — i — i — 

0 0 1 1 1 0 0 1 0 


1 

T 


i 1 " n i 1111 

dst 


r — i — r-r—i n — 

srd 


""i i "i — r~r i T" ,M| 

src2 



instruction Word Fields 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1, IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 



Description The product of the numbers at the srd and src2operands is loaded into the 
dst register. The numbers at the srd and src2operands are assumed to be 
32-bit signed integers. The result is assumed to be an unsigned 64-bit inte- 
ger. The output to the dst register is the 32 most significant bits of the result. 



Cycles 1 



11-165 



MPYUH1 3 Multiply Unsigned Integer Producing 32 MSBs, 3 Operands 



Status Bits if ST (SET COND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SET COND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unchanged. 
LV Unchanged. 
UF 0. 
N 0. 

Z 1 if all 64 bits of the product are 0, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Negative Integer With Borrow NEGB 



Syntax NEGB src, dst 
Operation o - src - C -> dst 

Operands src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 

Encoding 



31 


24 23 




16 15 


87 




0 


"1 — 1 — 


i i — i — i — r~ 


T — 


l l l l 


i i i i 


— 1 — 1 — 1 — 1 — T— 


1 — 1 — 1 — |— 


1 — 1— 


0 0 0 


0 10 110 


G 


dst 




src 







Description The difference of the 0, src, and C operands, calculated as shown, is loaded 
into the dst register. The dst and src are assumed to be signed integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a borrow occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 

Example negb R5,R7 

Before Instruction: 

R5 = 0FFFF FFCBh = - 53 
R7 = Oh 

LUF LV UF N Z V C = 0 0 0 0 0 0 1 

After Instruction: 

R5 = OFFFF FFCBh = -53 
R7 = 34h = 52 

LUF LV UF N Z V C = 0 0 0 0 0 0 1 
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N EG F Negate Floating-Point Value 



Syntax 

Operation 

Operands 



NEGF src, dst 
0 - src -» dst 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

ofsf register (R0-R11) 



Encoding 



31 


24 23 




16 15 


87 


0 




'"■■"■T 'i i — i— r ■ 




1 l 1 1 


i I i I 


T — T""T 1 1 — 1 1 1 


""I — l — l 


0 0 0 


0 10 111 


G 


dst 




src 





Description The difference of the 0 and src operands is loaded into the dst register. The 
cfefand src operands are assumed to be floating-point numbers. 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 

Example negf *++ar3(2),ri 

Before Instruction: 

AR3 = 80 9800h 

R1 = 05 7B40 0025h = 6.281 25006e + 01 
Data at 80 9802h = 70C 8000h = 1 .4050e + 02 
LUF LV UF N Z V C = 0 0 0 0 0 0 0 

After Instruction: 
AR3 = 80 9802h 

R1 = 07 F380 OOOOh = -1 .4050e + 02 

Data at 80 9802h = 70C 8000h = 1 .4050e + 02 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Parallel NEGFand STF NEGF||STF 



Syntax 



Operation 



Operands 



Encoding 



NEGF src2, dst1 
|| STF src3, dst2 

0 - src2 -» d$t1 
|| src3->dst2 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst1 register (RO - R7) 

src3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



31 




24 23 




16 


15 




87 




0 


T 

1 1 


™T 1 1 — T 

1 0 0 0 1 


r i — 

dst1 


— I — I — 
0 0 0 


src3 


i i i i i i i 

dst2 


i i i i i i i 

src2 



Description A floating-point negation and a floating-point store are performed in parallel. 

All registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STF) reads from a 
register and the operation being performed in parallel (NEGF) writes to the 
same register, then STF accepts as input the contents of the register before 
it is modified by the NEGF. 



Cycles 
Status Bits 



If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

1 

LUF 1 if a floating-point underflow occurs, 0 unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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NEGF| |STF Parallel NEFG and STF 



Example negf * AR4 — ( l ) , R7 

II STF R2,*++AR5(1) 
Before Instruction: 

AR4 = 80 98E1h 
R7 = 0h 

R2 - 07 33C0 OOOOh - 1 .797506 + 02 
AR5 = 80 9803h 

Data at 80 98E1 h - 57 B40 OOOOh = 6.281 250e + 01 
Data at 80 9804h = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 
AR4 = 80 98E0h 

R7 = 05 84C0 OOOOh = - 6.281 250e + 01 
R2 = 07 33C0 OOOOh . 1 .797506 + 02 
AR5 = 80 9804h 

Data at 80 98E1 h = 57B 4000h = 6.281 250e + 01 
Data at 80 9804h = 733 COOOh = 1 .79750e + 02 
LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Negate Integer N EG I 



Syntax 

Operation 

Operands 



NEGI srcdst 
0-src-> dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 

31 



24 23 



16 15 



87 



1 1 


"i i" i i r 




" i r i i 


— i — i — i — i — i — i i — n — i " r 




0 0 0 


0 1 10 0 0 


G 


dst 


src 





Description The difference of the 0 and src operands is loaded into the dst register. The 
dst and src operands are assumed to be signed integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a borrow occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 

Example NEGI 174 ,R5 (174 = OAEh) 

Before Instruction: 



R5 = ODCh = 220 
LUF LV UF N Z 

After Instruction: 



vc=o 0 0000 0 



R5 = 0FFFFFF52 = -174 
LUF LV UF N Z V C = 0 



0 0 1 0 0 1 
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NEGI||STI Parallel NEGI and STI 



Syntax 

Operation 

Operands 



Encoding 

31 



NEGI src2, dst1 
|| STI src3, dst2 

0 - src2 -» dst 1 
|| src3-*dst2 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst 1 register (RO - R7) 

src3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



24 23 



16 15 



87 



"'1 — 


"■"! — 1 — 1 — 1 — 


— 1 — 1 — 


1 — 1 — 


1 1 


— r — j. r „ 1 — j — | — r 


"™l "I i i r i i 


1 1 


10 0 10 


dst1 


0 0 0 


src3 


dst2 


$rc2 



Description An integer negation and an integer store are performed in parallel. All regis- 
ters are read at the beginning and loaded at the end of the execute 
cycle.This means that if one of the parallel operations (STI) reads from a 
register and the operation being performed in parallel (NEGI) writes to the 
same register, then STI accepts as input the contents of the register before 
it is modified by the NEGI. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 



Cycles 



i 



Status Bits LUF Unaffected. 

LV Vrf an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 
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Assembly Language Instructions 



Parallel NEGI and STI NEGI||STI 



Example negi *-ar3,R2 

II STI R2,*AR1++ 

Before lnst,ruptjpn; 

AR3 » 80 982Fh 

R2 = 19h = 25 

AR1 =80 98A5h 

Data at 80 982Eh . ODCh = 220 

Data at 80 98A5h = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR3 = 80 982Fh 

R2 a OFFFF FF24h=- 220 

AR1 - 80 98A6h 

Data at 80 982Eh = ODCh = 220 

Data at 80 98A5h = 1 9h = 25 

LUF LV UF N Z V G = 0 0 0 1 0 0 1 
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NOP No Operation 



Syntax 
Operation 

Operands 
Encoding 



NOP src 

No ALU or multiplier operations. 

ARn is modified if src is specified in indirect mode. 

src general addressing modes (G): 

0 0 register (no operation) 

1 0 indirect (modify ARn, 0 < n < 7) 



31 




24 23 




1615 


87 


0 


1 1 


11 -'1 1 


1 1 1 


1" 


l l l l 


1 1 1 1 


1 III 1 I 1 


1 1 1 1 


0 0 0 


0 1 


10 0 1 


G 


0 0 0 0 0 




src 





Description If the src operand is specified in the indirect mode, the specified addressing 
operation is performed and a dummy memory read occurs. If the src oper- 
and is omitted, no operation is performed. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example NOP 

Before Instruction: 

PC = 3Ah 

After instruction: 

PC = 3Bh 
Example nop *ar3- -(l) 

Before instruction: 

PC = 5h 

AR3 = 80 9900h 
After Instruction: 

PC = 6h 

AR3 = 80 98FFh 
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Assembly Language Instructions 



Normalize NORM 



Syntax 

Operation 

Operands 



NORM src, dst 
norm (src) dst 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 



Encoding 



31 




24 23 




16 15 




87 


0 


i i 


" 'I "1 






l l l l 




1 — 1 — 1 — 


i — i — i — i — r™r— 


T"T 1 1 


0 0 0 


0 1 


10 10 


G 


dst 






src 





Description The src operand is assumed to be an unnormalized floating-point number; 

i.e., the implied bit is set equal to the sign bit. The dst is set equal to the nor- 
malized src operand with the implied bit removed. The dst operand expo- 
nent is set to the src operand exponent minus the size of the left shift neces- 
sary to normalize the src. The dst operand is assumed to be a normalized 
floating-point number. 

For values of src: 

□ If src (exp) = -1 28 and src (man) = 0, then dst = 0, Z = 1 , and UF = 0. 

□ If src (exp) = -1 28 and src (man) * 0, then dst = 0, Z = 0, and UF = 1 . 

□ For all other cases of the src, if a floating-point underflow occurs, then 
dst (man) is forced to 0 and dst (exp) = -128. If src /man) = 0, then 
dst (man) = 0 and dst (exp) = -1 28. Refer to Section 4.7 on page 4-24. 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV Unaffected. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 
N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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NORM Normalize 



Example norm ri,r2 

Before Instruction; 

R1 = 04 0000 3AF5h 
R2 - 07 0C80 OOOOh 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 
R1 . 04 0000 3AF5h 

R2 = F2 6BD4 OOOOh • 1 .1 2451 61 3e - 04 

LUF LV UF N Z V CO 0 0 0 0 00 
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Assembly Language Instructions 



Bitwise Logical Complement NOT 



Syntax 

Operation 

Operands 



NOT src, dst 
-src -» dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 




24 23 




16 15 


87 


0 


"1 — 1 - 
0 0 0 


1 1 

0 1 


"i — n 

10 11 


— i — 

G 


l I I l 

dst 


l l l l l l l l I l l l l l l 

src 



Description The bitwise logical complement of the srcoperand is loaded into the dst reg- 
ister. The complement is formed by a logical NOT of each bit of the srcoper- 
and. The dsrand src operands are assumed to be unsigned integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 

V 0. 

C Unaffected. 
Mode Bit OVM Operation is affected by OVM bit value. 

Example NOT @982Ch,R4 

Before Instruction: 

DP = 80h 
R4 = 0h 

Data at 80 982Ch « 5E2Fh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

DP = 80h 

R4 = 0FFFF A1D0h 

Data at 80 982Ch = 5E2Fh 

LUF LV UF N Z V C = 0 0 0 1 0 0 0 
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NOTj jSTI Parallel NOT and STI 
Syntax 
Operation 



II 



NOT src2, dst1 
STI src3, dst2 



~src2 -> dst1 
src3 -» dst2 



Operands 



Encoding 



src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst1 register (RO - R7) 

src3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



31 




24 23 




16 15 




87 




0 


1 1 


I III 

10 0 11 


i i 

dst1 


I I ""■ 

0 0 0 


i i 

src3 


i i i i i i i 

dst2 


i i i i i i i 

src2 



Description A bitwise logical NOT and an integer store are performed in parallel. All reg- 
isters are read at the beginning and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (NOT) writes to the same reg- 
ister, then STI accepts as input the contents of the register before it is modi- 
fied by the NOT. 



Cycles 



If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

1 



Status Bits LUF Unaffected. 

LV Unaffected. 
UF 0. 
N 
Z 
V 

c 



MSB of the output. 

1 if a zero result is generated, 0 otherwise. 
0. 

Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11-178 



Assembly Language Instructions 



Parallel NOT and STI NOT| |STI 



Example not *+AR2,R3 

II STI R7,* AR4 (IR1) 

Before Instruction: 

AR2 . 80 99CBh 
R3 = 0h 

R7 = ODCh = 220 
AR4 = 80 9850h 
IR1 =10h 

Data at 80 99CCh = 0C2Fh 
Data at 80 9840h = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction; 

AR2 = 80 99CBh 
R3 = 0FFFF F3D0h 
R7 = ODCh = 220 
AR4 = 80 9840h 
IR1 =10h 

Data at 80 99CCh = 0C2Fh 

Data at 80 9840h = ODCh = 220 

LUF LV UF N Z V C = 0 0 0 1 0 0 0 
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OR Bitwise Logical OR 



Syntax 

Operation 

Operands 



OR src, dst 

dst OR src-* dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 

1 1 immediate (not sign-extended) 
dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 


87 




0 




— 1 — 1 — 1 — 1 — 1 — 




l l l l 


i i i i 


—i — i — i — i r- 




I 111 1 1 ~ 


0 0 0 


1 0 0 0 0 0 


G 


dst 




src 







Status Bits 



Description The bitwise logical OR between the src and ofef opreands is loaded into the 
dsf register. The dsrand srcoperands are assumed to be unsigned integers. 
Cycles 1 

If ST (SETCOND) = 0 and the destination register is R0 - R.1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
0. 

MSB of the output. 

1 if a zero result is generated, 0 otherwise. 
0. 

Unaffected. 

OVM Operation is not affected by OVM bit value. 

OR *++ARl (IR1) ,R2 

Before Instruction: 

AR1 = 80 9800h 
IR1 = 4h 

R2 = 01256 OOOOh 
Data at 80 9804h = 2BCDh 
LUF LV UF N Z V C = 0 0 0 



UF 
N 
Z 
V 

c 



Mode Bit 
Example 



0 0 0 0 



After Instruction: 

AR1 = 80 9804h 
IR1 = 4h 

R2 = 01256 2BCDh 

Data at 80 9804h = 2BCDh 

LUF LV UF N Z V C-0 



0 0 0 0 0 0 



11-180 



Assembly Language Instructions 



Bitwise Logical OR, 3 Operands 0R3 



Syntax OR3 src2, srrf, dst 

Operation S rc1 \ src2 -> dst (| = OR) 

Operands srd, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 
Type 1 

31 24 23 16 15 87 0 



— i — i — i — i — i — i — r i ■ 

0 0 1 0 0 1 0 1 1 


-T " 

T 


i i i i 

dst 


1 1 1 1 1 1 1 

srd 


— i — i — i — i — i — i — r- 

src2 


Type 2 

31 24 23 16 15 8 7 0 


III I I I I I 

0 0 110 10 11 


l 

T 


i i i i 

dst 


i i i i i i i 

srd 


i i i i i i i 

src2 



Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1 , IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 



11-181 



OR3 Bitwise Logical OR, 3 Operands 

Description The bitwise logical OR between the numbers at the srd and $rc2 operands 
is loaded into the dst register. The numbers at the srd, $rc2, and dst oper- 
ands are assumed to be unsigned integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0, the condition flags are modified if the destination reg- 
ister is R0— R1 1 . If ST (SETCOND) = 1 , they are modif ied for all destination 
registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Parallel 0R3 and STI 0R3| |STI 



Syntax 

Operation 

Operands 



0R3 src2, srd, dst1 
STI src3, dst2 



srd OR src2- 
src3 dst2 



dst1 



srd register (R0-R7) 

src2 indirect (disp = 0, 1, IRO, IR1) 

dst1 register (RO - R7) 

src3 register (R0-R7) 

dst2 indirect (disp « 0, 1, IRO, IR1) 



Encoding 

31 



24 23 



16 15 



87 





1 1 1" 1 1 


""■1 "1 " 


T 


I l 


— 1 — I — I — 1 — 1 — 1 — 1 — 


- ! -r — i — i -r r'T" 


1 1 


10 10 0 


dst1 


srd 


src3 


dst2 


src2 



Cycles 
Status Bits 



A bitwise logical OR and an integer store are performed in parallel. All regis- 
ters are read at the beginning and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (OR3) writes to the same reg- 
ister, then STI accepts as input the contents of the register before it is modi- 
fied by the OR3. 

If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 

1 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N MSB of the output. 

Z 1 if a zero result is generated, 0 otherwise. 

V 0. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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0R3| |STI Parallel OR3 and STI 



Example 0R3 *++AR2,R5,R2 

II STIR6,*AR1 

Before Instruction: 

AR2 = 80 9830h 
R5 = 80 OOOOh 
R2 = 0h 

R6 = ODCh = 220 
AR1 = 80 9883h 
Dataat80 9831h = 9800h 
Data at 80 9883h = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR2 = 80 9831h 

R5 m 80 OOOOh 

R2 = 80 9800h 

R6 = ODCh = 220 

AR1 = 80 9882h 

Data at 80 9831 h = 9800h 

Data at 80 9883h = ODCh = 220 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



POP Integer POP 



Syntax POP dst 

Operation *SP > dst 

Operands dst register (any register in CPU primary register file) 
Encoding 



31 




24 23 




16 15 87 0 


™ 1 1 

0 0 0 


1 1 

0 1 


110 0 


0 1 


1 1 1 I 

dst 


i i i i i i i I i i i i i i i 

0000000000000000 



Description The top of the current system stack is popped and loaded into the dst regis- 
ter. The top of the stack is assumed to be a signed integer. The POP is per- 
formed with a post decrement of the stack pointer. 



Cycles 



1 



Status Bits if ST (SETCOND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example pop R3 

Before Instruction: 

SP = 80 9856h 

R3 = 012DAh = 4,826 

Data at 80 9856h = 0FFFF 0DA4h = - 62,044 
LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

SP = 80 9855h 

R3 o 0FFFF 0DA4h = -62,044 

Data at 80 9856h = 0FFFF 0DA4h = - 62,044 

LUF LV UF N Z V C = 0 0 0 1 0 0 0 
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POPF POP Floating-Point Value 



Syntax POPF dst 

Operation *SP » dst1 

Operands dst register (R0-R11) 
Encoding 



31 


24 23 




16 15 


87 0 




— r — 1 — , — | — 


1 


1 1 1 I 


III! 


\ — i — i — i — r i — r i — r r™r" 


0 0 0 


0 1 110 1 


0 1 


dst 


0 0 0 0 


00 0 00 0 00 0 000 



Description The top of the current system stack is popped and loaded into the dst regis- 
ter. The top of the stack is assumed to be a floating-point number. The POP 
is performed with a post decrement of the stack pointer. 

Cycles 1 

Status Bits LUF Unaffected. 



Mode Bit 
Example 



UF 

LV 

N 

Z 

V 

c 



Unaffected. 

1 if a negative result is generated, 0 otherwise. 
1 if a zero result is generated, 0 otherwise. 
0. 

Unaffected. 



OVM Operation is not affected by OVM bit value. 

POPF R4 

Before Instruction: 

SP = 80 984Ah 

R4 = 02 5D2E 01 23h = 6.91 1 86578e + 00 

Data at 80 984Ah = 5F2C 1302h = 5.32544007e + 28 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 

After Instruction: 

SP = 80 9849h 

R4 = 5F 2C13 0200h = 5.32544007e + 28 

Data at 80 984Ah - 5F2C 1302h = 5.32544007e + 28 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



PUSH Integer PUSH 



Syntax PUSH src 

Operation S rc -» *++S P 

Operands src register (any register in CPU primary register file) 
Encoding 



31 




24 23 




16 15 


87 


0 


0 0 0 


1 1 

0 1 


1110 


1 

0 1 


1 1 I I 

src 


i i i i 

0 0 0 0 


i i i i i i i 

0 0 0 0 0 0 0 


0 0 0 0 0 



Description The contents of the srcregister are pushed on the current system stack. The 
src is assumed to be a signed integer. The PUSH is performed with a prein- 
crement of the stack pointer. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 
UF 
N 
Z 
V 

c 



Unaffected. 
Unaffected. 
Unaffected. 
Unaffected. 
Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 

Example push r6 

Before Instruction: 

SP = 80 98AEh 
R6 = 815Bh = 33,115 
Data at 80 98AFh = Oh 
LUF LV UF N Z V 

After Instruction: 



C=0 0 0 0 0 0 0 



SP = 80 98AFh 

R6 = 815Bh = 33,115 

Data at 80 98AFh « 81 5Bh = 33,1 1 5 

LUF LV UF N Z V C = 0 0 0 



0 0 0 0 
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PUSHF PUSH Floating-Point Value 



Syntax 
Operation 
Operands 
Encoding 



PUSHF src 

src -> *++SP 

src register (R0-R11) 



31 




24 23 




16 15 


87 


0 


T "1 


i — r 


- n — i — 




1 l l 1 




1 1 1 1" 1 1 1 1 1 1 


- "1 — 1— 


0 0 0 


0 1 


1111 


0 1 


src 


0 0 0 0 


000000000 


0 0 0 



Description The contents of the src register are pushed onto the current system stack. 

The src is assumed to be a floating-point number. The PUSH is performed 
with a preincrement of the stack pointer. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 
UF 
N 
Z 
V 

c 



Unaffected. 
Unaffected. 
Unaffected. 
Unaffected. 
Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
Example pushf R2 

Before Instruction: 

SP = 80 9801 h 

R2 = 02 5C12 8081 h = 6.87725854e + 00 
Data at 80 9802h = Oh 
LUF LV UF N Z V C-'O 0 0 0 0 



0 0 



After Instruction: 



SP = 80 9802h 

R2 = 02 5C1 28081 h = 6.87725854e + 00 

Data at 80 9802h = 025C 1280h = 6.87725830e + 00 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Reciprocal of Floating-Point Value RCPF 



Syntax RCPF src, dst 

Operation 1 6-bit reciprocal of src -> dst 

Operands src extended-precision register, direct and indirect addressing modes 
dst R0-R11 

Encoding 



31 


24 23 






16 15 


87 




0 


—i — m — i — i 


— 1 — 1 — 1 — 




— T" 


I I l 


i i i i 


"i — n ™i ™i i "'i 


— 1 — 1 — T" 




0 0 0 1 1 


10 10 


G 




dst 




src 







Instruction Word Fields 



G 


src addressing modes 


00 


extended-precision register 
(R0-R11) 


01 


direct mode 


10 


indirect mode 



Description The 1 6-bit approximation of the reciprocal of the src operand is loaded into 
the ofsf register. The dst and src operands are assumed to be floating-point 
numbers. 



Cycles 



1 



Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 is a zero result, 0 otherwise. 

V 1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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RETICOnd Return From Interrupt or Trap Conditionally 



Syntax RETlcond 

Operation If (cond is true) 
*(SP) -> PC 
ST(PGIE) -> ST(GIE) 
ST(PCF) -» ST(CF) 
Else, continue 

Operands None 

Encoding 

31 24 23 



T — 1—1 — 1—1 

0 1111 



1 — I — |— T — I — 
0 0 0 0 0 0 



i — i — i — r 

cond 



16 15 



87 



— I — I — I — I — I — I — I — I — I — I — I — I — I — I — T" 

0000 000000000000 



Description If the condition is true, then the top of the stack is popped to the PC, PGIE 
is copied to GIE, and PCF is copied to CF. If the condition is not true, then 
continue normal operation (see Section 1 1 .2 on page 1 1 -1 0 for a list of con- 
dition mnemonics, encoding, and flags). 

Cycles 4 



LUF 


Unaffected. 


LV 


Unaffected. 


UF 


Unaffected. 


N 


Unaffected. 


Z 


Unaffected. 


V 


Unaffected. 


c 


Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Return From Interrupt or Trap Conditionally Delayed RETIcOndD 



Syntax RETlcondD 

Operation If (cond is true) 
*(SP) -> PC 
ST(PGIE) -> ST(GIE) 
ST(PCF) -> ST(CF) 
Else, continue 

Operands None 

Encoding 

_31 24 23 



T — I — I — 

0 1111 



— i — i — i — i — r~ 

0 0 0 0 0 1 



16 15 



87 



1 — I — I — I — I — 1—1 — I — I — T" 

0000 000000000000 



cond 



Description Performs a delayed return from an interrupt or trap. 

Since this is a delayed return, the three instructions following the 
RETlcondD are fetched and executed. These three instructions may nei- 
ther modify the program flow nor load the status register (see Section 1 1 .2 
on page 11-10 for a list of condition mnemonics, encoding, and flags). 



Cycles 



Interrupts are disabled for the duration of the RETlcondD. 
1 



LUF 


Unaffected. 


LV 


Unaffected. 


UF 


Unaffected. 


N 


Unaffected. 


Z 


Unaffected. 


V 


Unaffected. 


c 


Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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RETScond Return from Subroutine Conditionally 



Syntax 
Operation 

Operands 
Encoding 



RETScond 

If cond is true: 

*SP > PC. 

Else, continue. 

None 



31 




24 23 




16 15 


87 


0 


1 1 1 






1 


1 1 1 1 


i i 


—i — i™ i — i — i — r— r i "i 




0 1 1 


1 1 


0 0 0 1 


0 0 


cond 


0 0 


00 0000000 


0 0 0 0 0 



Description A conditional return is performed. If the condition is true, the top of the stack 
is popped to the PC. 

The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 1 1 .2 on page 1 1 -1 0 for a list of condition mnemonics, 
encoding, and flags). 

Cycles 
Status Bits 



4 




LUF 


Unaffected. 


LV 


Unaffected. 


UF 


Unaffected. 


N 


Unaffected. 


Z 


Unaffected. 


V 


Unaffected. 


c 


Unaffected. 



Mode Bit 
Example 



OVM Operation is not affected by OVM bit value. 

RETSGE 

Pefore instruction; 

PC = 123h 
SP - 80 983Ch 
Data at 80 983Ch = 456h 
LUF LV UF N Z V C = 0 



0 0 0 0 0 0 



After Instruction: 

PC = 456h 

SP = 80 983Bh 

Data at 80 983Ch = 456h 

LUF LV UF N Z V C - 0 



0 0 0 0 0 0 
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Assembly Language Instructions 



Round Floating-Point Value RND 



Syntax 

Operation 

Operands 



RND src, dst 
rnd(src) -> dst 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (R0-R11) 



Encoding 



31 


24 23 




16 15 


87 




0 


i i 


— r-i "i — r—T— 


T — 


l I I l 


i i i i 


—| — i — i — r — i — r 


"V 1" i 


"1 1 11 


0 0 0 


1 0 0 0 1 0 


G 


dst 




src 







Description The result of rounding the srcoperand is loaded into the dst register.The src 
operand is rounded to the nearest single-precision floating-point value. If 
the srcoperand is exactly halfway between two single-precision values, it 
is rounded to the most positive of those values. 



Cycles 
Status Bits 



1 

LUF 1 



LV 

UF 

N 

Z 

V 

c 



if a floating-point underflow occurs, unchanged otherwise. 

if a floating-point overflow occurs, unchanged otherwise. 

if a floating-point underflow occurs, 0 otherwise. 

if a negative result is generated, 0 otherwise. 

if a zero result is generated, 0 otherwise. 

if a floating-point overflow occurs, 0 otherwise. 



Mode Bit 
Example 



Unaffected. 
OVM Operation is affected by OVM bit value. 

RND R5,R2 

Before Instruction: 

R5 = 07 33C1 6EEFh = 1 .79755599e + 02 
R2 = 0h 

LUF LV UF N Z V C = 0 0 0 0 0 



0 0 



After Instruction: 



R5 = 07 33C1 6EEFh = 1 .79755599e + 02 
R2 = 07 33C1 6F00h = 1 .79755600e + 02 
LUF LV UF N Z V C = 0 0 0 0 0 0 
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ROL Rotate Left 



Syntax ROL dst 

Operation dst left-rotated 1 bit -» dst 

Operands dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 


87 


0 


I — 1 — 


— i — r — i — i — r— 


1 


l l l l 


i i 


— l — r . T , — , — j — | — , — r 


— i — i — i — r— 


0 0 0 


1 0 0 0 1 1 


1 1 


dst 


0 0 


00000 0 000 


0 0 0 0 0 



Description The contents of the dsr operand are left-rotated one bit and loaded into the 
dst register. This is a circular rotate with the MSB transferred into the LSB. 

Rotate left: 

Pc~|«- || ofef <- 

Cycles 1 

Status Bits If ST (SETCOND) - 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Set to the value of the bit rotated out of the high-order bit. Unaffected 
if dst is not R0-R7. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example rol R3 

Before Instruction: 

R3 = 8002 5CD4h 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 

R3 = 0004 B9A9h 

LUF LV UF N Z V C = 0 0 0 0 0 0 1 
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Assembly Language Instructions 



Rotate Left Through Carry ROLC 



Syntax ROLC dst 

Operation dst left-rotated 1 bit through carry bit -» dst 
Operands dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 


87 0 


1 1 

0 0 0 


1 1 1 "I - 1 

10 0 10 0 


1 1 


l l l l 

dst 


i i i 

0 0 0 


1 1 1 1 1 1 1 1 1 1 1 1 

0000000000000 



Description The contents of the dst operand are left-rotated one bit through the carry bit 
and loaded into the dst register. The MSB is rotated to the carry bit, at the 
same time the carry bit is transferred to the LSB. 

Rotate left through carry bit: 



l< — 


c 


<— 


dst 










> 



Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is R0— R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Set to the value of the bit rotated out of the high-order bit. If dst is not 
R0 - R7, then C is shifted into the dst but not changed. 

Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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ROLC Rotate Left Through Carry 



Example rolc R3 

Before Instruction: 

R3 - 0000 0420h 

LUF LV UF N Z V C = 0 

After Instruction; 

R3 = 00000 0841 h 

LUF LV UF N Z V C = 0 

Example rolc R3 

Before Instruction: 

R3 = 8000 4281 h 

LUF LV UF N Z V C = 0 

After Instruction: 

R3 = 0000 8502h 

LUF LV UF N Z V C = 0 



0 0 0 0 0 1 



0 0 0 0 0 0 



0 0 0 0 0 0 



0 0 0 0 0 1 
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Assembly Language Instructions 



Rotate Right ROR 



Syntax ROR dst 

Operation dst right-rotated 1 bit through carry bit -> dst 

Operands dst register (any register in CPU primary register file) 
Encoding 



31 


24 23 




1615 87 


0 




i i n — n - 




1 I I I 


l l I l l l l l l l l 


—i — i — i — r— 


0 0 0 


10 0 10 1 


1 1 


dst 


00000000000 


0 0 0 0 0 



Description The contents of the dst operand are right-rotated one bit and loaded into the 
dst register. The LSB is rotated into the carry bit and also transferred into 
the MSB. 



Rotate right: 



dst 



<- 



Cycles 



1 



Status Bits If ST (SETCOND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Set to the value of the bit rotated out of the high-order bit. Unaffected 
if dsf is not R0 -R7. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example ror R7 

Pefore instruction; 



R7 = 00000421 h 

LUF LV UF N Z V C = 0 

After Instruction: 

R7 = 8000021 Oh 

LUF LV UF N Z V C = 0 



0 0 0 0 0 0 



0 0 1 0 0 1 
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RORC Rotate Right Through Carry 



Syntax RORC ofsf 

Operation dst right-rotated 1 bit through carry bit -> ofsf 
Operands dst register (any register in CPU primary register file) 
Encoding 

31 24 23 16 15 87 J) 



1 1 

0 0 0 


— 1 — 1 — 1 — 1 — 1 — 

1 0 0 1 10 


— 1" 

1 1 


— 1 — 1 — 1 — 1 — 

ofsf 


i i"""T"" ■ i r "i r 1 i it* i — i — m — n" 1 

1111111111111111 



Description The contents of the ofsf operand are right-rotated one bit through the status 
register's carry bit. This could be viewed as a 33-bit shift. The carry bit value 
is rotated into the MSB of the ofsf; at the same time, the ofsf LSB is rotated 
into the carry bit. 

Rotate right through carry bit: 




Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Set to the value of the bit rotated out of the high-order bit. If ofsf is not 
R0 - R7, then C is shifted in but not changed. 

OVM Operation is not affected by OVM bit value. 

RORC R4 

Before InsfrUPtipn: 

R4 = 8000 0081 h 

LUF LV UF N Z V C = 0 0 0 1 0 0 0 
After Instruction: 

R4 = 4000 0040h 

LUF LV UF N Z V C-0 0 0 0 0 0 1 



Mode Bit 
Example 
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Assembly Language Instructions 



Repeat Block RPTB 



Syntax RPTB src 

Operation S rc + PC +1 RE 
1 -> ST (RM) 
Next PC -> RS 

Operands src 24-bit signed immediate displacement or register mode 
Encoding 

For 24-bit signed immediate or register mode: 

31 24 23 16 15 8 7 

—I — I— i — i — i — i — r — — 



t — i — i — i — i — i — i — i — i — i — i — r 

src (displacement) 



0 11 0 0 10 0 



For register mode: 

31 24 23 



T 



T 



16 15 



87 



— i — i — i — i — i — i — i — i — i — i — i — i — i — i — r 

000000 0 0 0000000000 



i — i — r 

src 



0 11 11 0 0 10 



Description RPTB allows a block of instructions to be repeated a number of times with- 
out any penalty for looping. 

It activates the block repeat mode of updating the PC. The srcoperand may 
be a 32-bit register value or a 24-bit signed immediate value (displacement). 
The resulting src address is the end address of the block to be repeated. 
This address is loaded into the repeat end address (RE) register. A 1 is writ- 
ten into the repeat mode bit of status register (ST(RM)) to indicate that the 
PC is to be updated in the repeat mode. The address of the next instruction 
is loaded into the repeat start address (RS) register. 

Cycles 
Status Bits 



4 




LUF 


Unaffected. 


LV 


Unaffected. 


UF 


Unaffected. 


N 


Unaffected. 


Z 


Unaffected. 


V 


Unaffected. 


c 


Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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RPTBD Repeat Block Delayed 



Syntax RPTBD src 

Operation if src is an immediate value (displacement) 

src+ PC + 3 -» RE 
Else: 

src -» RE 
1 -> ST(RM) 
PC of RPTBD + 4 -> RS 

Operands src 24-bit signed immediate displacement or register mode 
Encoding 

For 24-bit signed immediate or register mode: 

31 24 23 16 15 87 



" 1 1 1 1 ""1 1 1 

0 11 0 0 10 1 


— i — i i i — i — i — i — i — i — i — i — r— i — i i r ii ii i i i 

src (displacement) 


For register mode: 

31 24 23 16 15 8 7 0 


I I 1 1 I I I 1 

0 1 111 0 0 1 1 


i i i i i i i i i i i i i i i i i 

000000 000000000000 


src 



Description RPTBD allows a block of instructions to be repeated a number of times with- 
out any penalty for looping and with single-cycle execution of the RPTBD 
instruction. 

It activates the block repeat mode of updating the PC. The srcoperand may 
be a 32-bit register value or a 24-bit signed immediate value (displacement). 
The resulting src address is loaded into the repeat end address (RE) regis- 
ter (block end address). A 1 is written to the status-register repeat mode 
bit (ST(RM)), indicating the PC is to be updated in the repeat mode. The ad- 
dress of the next instruction +3 is loaded into the repeat start address (RS) 
register. 

RPTBD does not flush the pipeline. The three instructions following RPTBD 
are executed and may not be an instruction that modifies the program flow. 
These three instructions are not part of the block that is repeated. 
Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Repeat Single RPTS 



Syntax 
Operation 



Operands 



RPTS src 

src-» RC 
1 -> ST (RM) 

1 ->s 

Next PC -> RS 
Next PC -» RE 

src general addressing modes (G): 

0 0 register 

0 1 direct 

1 0 indirect 
1 1 immediate 



Encoding 



31 


24 23 




16 15 


87 




0 


I 1 


— i — i — i — i — r- 


!' 






~i — r™i — r"i t 


i ■ 1 — r 


"r-r 


0 0 0 


10 0 111 


G 


110 11 




src 







Description The RPTS instruction allows a single instruction to be repeated a number 
of times without any penalty for looping. Fetches can also be made from the 
instruction register (IR), thus avoiding repeated memory access. 

The src operand is loaded into the repeat counter (RC). A 1 is written into 
the repeat mode bit of the status register ST (RM). A 1 is also written into 
the repeat single bit (S). This indicates that the program fetches are to be 
performed only from the instruction register. The next PC is loaded into the 
repeat end address (RE) register and the repeat start address (RS) register. 

For the immediate mode, the src operand is assumed to be an unsigned in- 
teger and is not sign-extended. 

Cycles 
Status Bits 



4 




LUF 


Unaffected. 


LV 


Unaffected. 


UF 


Unaffected. 


N 


Unaffected. 


Z 


Unaffected. 


V 


Unaffected. 


c 


Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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RPTS Repeat Single 

Example rpts AR5 

Before Instruction: 

PC = 123h 
ST = 0h 
RS = 0h 
RE = Oh 
RC = 0h 
AR5 = OFFh 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 

After Instruction: 

PC = 124h 
ST = 100h 
RS = 124h 
RE = 124h 
RC - OFFh 
AR5 = OFFh 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Reciprocal of Square Root Floating-Point Value RSQRF 



Syntax RSQRF src, dst 

Operation 1 6-bit reciprocal of the square root of src dst 

Operands src extended-precision register, direct and indirect addressing mode? 
dst extended-precision register 

Encoding 



31 


24 23 




16 15 




87 


0 


"■"T "i i — m 


1 1 1 


T— 


i i i i 


i 


i — i t - r 


i i i— i — r r 




0 0 0 1 1 


10 0 1 


G 


dst 






src 





Instruction Word Fields 



G 


src addressing modes 


00 


extended-precision register 
(R0-R11) 


01 


direct mode 


10 


indirect mode 



Description The 1 6-bit approximation of the reciprocal of the square root of the number 
at the srcoperand is loaded into the dst register. The number at the srcop- 
erand is assumed to be positive. The operation for negative inputs is unde- 
fined. 

The value at the ofef and srcoperands are assumed to be floating-point num- 
bers. 

Cycles 1 

Status Bits LUF Unchanged. 

LV 1 if input is zero unchanged otherwise. 

UF 0. 

N 0. 

Z 0 

V 1 if input is zero, 0 otherwise. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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SIGI Signal, Interlocked 



Syntax SIGI src, dst 

Operation LOCK (or LLOCK) pin brought low. 

src-> dst 

LOCK (or LLOCK) pin brought high. 

Operands src direct and indirect addressing modes (assumed to be signed integer) 
dst register mode (assumed to be signed integer) 

Encoding 



31 


24 23 




16 15 


87 


0 


-i — r-i — 






1 l l l 


i I 1 


i — i — i — i — I - t r-i ' 1 




0 0 0 1 


0 110 0 


G 


dst 




src 





Instruction Word Fields 



G 


src addressing modes 


01 


direct mode 


10 


indirect mode 



Description An inte rlo cking op eration is signaled using the appropriate bus-lock signal 
(LOCK or LLOCK) if and only if an external memory access is performed. 
The src and dst operands are assumed to be signed integers. After the read 
is performed, the bus-lock signal is deasserted. If an internal memory ac- 
cess is performed, SIGI will perform the read but will not assert a bus-lock 
signal. 

The numbers at the src and dst operands are treated as signed integers. 
Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Store Floating-Point Value STF 



Syntax STF $rc, dst 

Operation S rc -» dst 

Operands src register (RO - R1 1 ) 

dst general addressing modes (G): 
0 1 direct 
1 0 indirect 

Encoding 



31 


24 23 




16 15 




8 7 




0 


i i 


— 1 — 1 — 1 — 1 — 1 — 


— 1 — 


I I I l 


i i 


"1 — l — l — 


i — i — i — r 


— i — i — i — r 


m — r- 


0 0 0 


1 0 10 0 0 


G 


src 






dst 







Description The srcregister is loaded into the dst memory location. The srcand cfef oper- 
ands are assumed to be floating-point numbers. 

Cycles 1 



LUF 


Unaffected. 


LV 


Unaffected. 


UF 


Unaffected. 


N 


Unaffected. 


Z 


Unaffected. 


V 


Unaffected. 


c 


Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
Example STF R2,@98Alh 

Before Instruction: 

DP = 80h 

R2 = 052 C501 900h = 4.30782204e + 01 
Dataat 80 98A1h = 0h 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 
DP = 80h 

R2 = 05 2C50 1900h = 4.30782204e + 01 

Data at 80 98A1 h - 52C 501 9h . 4.30782204e + 01 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 
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STFI Store Floating-Point Value, Interlocked 



Syntax STFI src, dst 

Operation S rc dst 

Signal end of interlocked operation. 

Operands src register (RO - R1 1 ) 

ctef general addressing modes (G): 
0 1 direct 
1 0 indirect 



Encoding 

31 



24 23 



16 15 



87 



1 1 




— T" 


— \ — i — i — r™ 


1 1 — 1 — 1 — 1 — 1 — 1 — i — i — | -| T" |-T— 1 " 


0 0 0 


10 10 0 1 


G 


src 


dst 



Description The src register is loa ded into th e dst me mory location. An interlocked oper- 
ation is signaled over LOCK or LLOCK. The src and dst operands are as- 
sumed to be floating-point numbers. Refer to Section 7.7 on page 7-39 for 
detailed information. 



Cycles 



1 



LUF 


Unaffected. 


LV 


Unaffected. 


UF 


Unaffected. 


N 


Unaffected. 


Z 


Unaffected. 


V 


Unaffected. 


c 


Unaffected. 



Mode Bit 
Example 



OVM Operation is not affected by OVM bit value. 

STFI R3,*-AR4 

Before Instruction: 

R3 = 07 33C0 OOOOh = 1 .79750e + 02 
AR4 - 80 993Ch 
Data at 80 993Bh = Oh 
LUF LV UF N Z V C = 0 0 0 0 



0 0 0 



After Instruction: 



R3 = 07 33C0 OOOOh = 1 .79750e + 02 
AR4 = 80 993Ch 

Data at 80 993Bh = 733 COOOh = 1 .79750e + 02 
LUF LV UF N Z V "C - 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Parallel Store Floating-Point Value STF| |STF 



Syntax 

Operation 

Operands 

Encoding 



STF src2, dst2 
|| STF srrf, d$t1 

src2-» d$t2 
|| srrf -» dsti 

srrf register (Rn1 , 0 < n1 £ 7) 
dsti indirect (disp = 0, 1 , IRO, IR1 ) 
src2 register (Rn2, 0 < n2 < 7) 
dst2 indirect (disp = 0, 1 , IRO, IR1 ) 



31 




24 23 




16 15 




87 




0 


1 1 " 

1 1 


i i r i— 

0 0 0 0 0 


$rc2 


1 1 

0 0 0 


srrf 


I 1 I I 1 I i 

dsti 


i i i i i i i 

dst2 



Description Two STF instructions are executed in parallel. Both srrf and src2 are as- 
sumed to be floating-point numbers. 

Cycles 1 



LUF 


Unaffected. 


LV 


Unaffected. 


UF 


Unaffected. 


N 


Unaffected. 


Z 


Unaffected. 


V 


Unaffected. 


c 


Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11-207 



STF| |STF Parallel Store Floating-Point Value 



Example stfr4,*ar3- - 

II STF R3,*++AR5 

Before Instruction; 

R4 = 07 0C80 OOOOh = 1 .4050e + 02 
AR3 = 80 9835h 

R3 = 07 33C0 OOOOh = 1 .79750e + 02 
AR5 = 80 99D2h 
Data at 80 9835h = Oh 
Data at 80 99D3h - Oh 

LUF LV UF N Z V C . 0 0 0 0 0 0 0 

After Instruction: 

R4 - 07 0C80 OOOOh = 1 .4050e + 02 
AR3 - 80 9834h 

R3 = 07 33G0 OOOOh = 1 .797508 + 02 
AR5 = 80 99D3h 

Data at 80 9835h = 070C 8000h . 1 .4050e + 02 
Data at 80 99D3h = 0733 COOOh « 1 .79750e + 02 
LUFLV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Store Integer STI 



Syntax 

Operation 

Operands 



STI src, dst 
src -» dst 

src register (any register in CPU primary register file) 

dst general addressing modes (G): 
0 1 direct 
1 0 indirect 



Encoding 



31 


24 23 




16 15 


87 


0 


— I — I — 

0 0 0 


— i — i — i — i — r- 

10 10 10 


— I — 

G 


1 1 1 I 

src 


i i i i I i i i i I i i I i i 

dst 



Description The srcregister is loaded into the cfef memory location. The srcand dst oper- 
ands are assumed to be signed integers. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

2 Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example STI R4,§982Bh 

Before Instruction: 

DP = 80h 

R4 = 42BD7h = 273,367 
Data at 80 982Bh = 0E5FCh = 58,876 
LUF LV UF N Z V C = 0 0 0 0 



0 0 0 



After Instruction: 

DP = 80h 

R4 = 42 BD7h = 273,367 

Data at 80 982Bh = 42BD7h = 273,367 

LUF LV UF N Z V C = 0 0 0 0 



0 0 0 
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STII Store Integer, Interlocked 



Syntax STII src, dst 

Operation src -> dst 

Signal end of interlocked operation. 

Operands src register (any register in CPU primary register file) 

dst general addressing modes (G): 
0 1 direct 
1 0 indirect 

Encoding 



31 


24 23 




16 15 


87 


0 


i r™ 


1 1 1 \ \ 


1 


l i l l 


i i i i 


1 1 1 1 1 1 1 1 


1 II 


0 0 0 


10 10 11 


G 


src 




dst 





Description The src register is loa ded into th e dst me mory location. An interlocked oper- 
ation is signaled over LOCK or LLOCK. The src and dst operands are as- 
sumed to be signed integers. Refer to Section 7.7 on page 7-39 for detailed 
information. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 

Example STII Rl,@98AEh 

Before Instruction: 

DP = 80h 
R1 . 78Dh 

Data at 80 98AEh = 25Ch 

After Instruction: 

DP = 80h 
R1 = 78Dh 

Data at 80 98AEh = 7BDh 
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Assembly Language Instructions 



Parallel STI and STI STI||STI 



Syntax 

Operation 

Operands 



Encoding 

31 



STI src2, dst2 
STI srd, dstl 



$rc2- 
srd ■ 



dst2 
dstl 



srd register (RO - R7) 

dstl indirect (disp = 0, 1 , IRO, IR1 ) 

src2 register (RO - R7) 

dst2 indirect (disp = 0,1, IRO, IR1 ) 



24 23 



16 15 



87 



1 


1 1 1 1 




' 1 "1 


"T" 1 " 


i"""r r i- "i i i" 


— \ — mr— i — « — i — i 


1 1 


0 0 0 0 1 


src2 


0 0 0 


srd 


dstl 


d$t2 



Description Two integer stores are performed in parallel. If both stores are executed to 
the same address, the value written is that of STI src2, dst2. 



Cycles 



1 



Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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STI||STI Parallel STI and STI 



Example sti ro, *++ar2 (IRO) 

I | STI R5, *ARO 

Before Instruction: 

RO = ODCh = 220 
AR2 = 80 9830h 
IRO = 8h 
R5 = 35h = 53 
ARO - 80 98D3h 
Data at 80 9838h = Oh 
Data at 80 98D3h = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction; 

RO = ODCh = 220 

AR2 = 80 9838h 

IRO = 8h 

R5 = 35h - 53 

AR0 = 80 98D3h 

Data at 80 9838h = ODCh = 220 

Data at 80 98D3h = 35h = 53 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Store Integer Immediate Value STIK 



Syntax 

Operation 

Operands 

Encoding 



STIK src, dst 
src-> dst 

src 5-bit signed integer 
dst direct and indirect mode 



31 


24 23 




16 15 


87 






0 


- i "i r~ n 


"I 1 — T— 


— 1 — 


l l l l 


■ i i i 


—i — i — i — i — r 


—1 — 1 


1 1 


n — r— 


0 0 0 1 0 


10 10 


G 


dst 




src 









Instruction Word Fields 



G 


dst addressing modes 


00 


direct mode 


11 


indirect mode 



Description The 5-bit signed integer src value is loaded into the dst memory location. 
The src and dst operands are assumed to be signed integers. 

Cycles 1 



LUF 


Unaffected. 


LV 


Unaffected. 


UF 


Unaffected. 


N 


Unaffected. 


Z 


Unaffected. 


V 


Unaffected. 


c 


Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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SUBB Subtract Integer With Borrow 



Syntax 

Operation 

Operands 



SUBB src, dst 
dst-src-C-~> 



dst 



src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 




24 23 




16 15 


87 




0 


— 1 — 1 — 


1 1 1 


— I — 1 — 


"T" 


1 1 1 1 


i i i i i 


i i— i—i ™i— r 


~t r i " 


T 


0 0 0 


1 0 1 


1 0 1 


G 


dst 




src 







Description The difference of the dst, src, and C operands, as calculated above, is 
loaded into the otef register. The dsrand src operands are assumed to be 
signed integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a borrow occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 

Example SUBB *AR5++(4),R5 

Before instruction: 

AR5 = 80 9800h 

R5 = 0FAh = 250 

Data at 80 9800h = 0C7h = 1 99 

LUF LV UF N Z V C = 0 0 0 0 0 0 1 

After Instruction: 

AR5 = 80 9804h 
R5 = 032h = 50 

Data at 80 9800h = 0C7h = 1 99 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 
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Assembly Language Instructions 



Subtract Integer With Borrow, 3 Operands SUBB3 



Syntax SUBB3 src2, srtf, dst 
Operation S rd - src2 - C -> ofsf 

Operands srd, src2 both type 1 or type 2 three-operand addressing modes 
ofsf register mode (any register in CPU primary register file) 

Encoding 
Typel 



■■"i i"— r i i — i — i — r 


— I - 


— i — r— i — r~ 


—1 — 1 — 


1 — 1 — 1 — 


n 


t — i — r i i 


1 — r- 


0 0 1 0 0 1 1 0 0 


T 


dst 




srd 




src2 





Type 2 

31 24 23 16 15 87 : 0 



III 1 1 1 1 1 

0 0 1 10 1 10 0 


T 


1 III 

dst 


1 1 1 1 1 1 1 

srd 


t i — r i -r i i - 

src2 



Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1 , IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp - 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1 ) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn (5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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SUBB3 Subtract Integer With Borrow, 3 Operands 



Description The difference of the srd and src2operands and the C (carry) flag is loaded 
into the dst register. The srd, src2, and dst operands are assumed to be 
signed integers. 

Cycles 1 

Status Bits If ST (SETCOND) - 0 and the destination register is RO - R1 1 , the condition 



flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a borrow is generated, 0 otherwise. 




Mode Bit 



OVM Operation is affected by OVM bit value. 
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Assembly Language Instructions 



Subtract Integer Conditionally SU BC 



Syntax SUBC src, dst 

Operation |f (cfef- src ;> 0): 

(dst- src« 1) OR 1 -» dst 
Else: 

cfef« 1 --» cfef 

Operands src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 




24 23 




16 15 


87 




0 


— 1 — 1 — 


1 1- 1 


I — I — 1 — 


— 1 — 


i i i i 


i i i i 


— i—i — i — i — r~ 


i — r i i 




0 0 0 


1 0 1 


1 1 0 


G 


dst 




src 







Description The src operand is subtracted from the dst operand. The dst operand is 
loaded with a value that depends upon the result of the subtraction. If (dst 
- src) is greater than or equal to zero, then (dst- src) is left-shifted one bit, 
the least-significant bit is set to 1 , and the result is loaded into the dst regis- 
ter. If (dst -src) is less than zero, dst is left-shifted one bit and loaded into 
the dst register. The cfef and srcoperands are assumed to be unsigned inte- 
gers. 

SUBC may be used to perform a single step of a multi-bit integer division. 
See subsection 12.3.4 for a detailed description. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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SUBC Subtract Integer Conditionally 



Example SUBC @98C5h,Rl 
Before Instruction: 
DP = 80h 

R1 =04F6h = 1270 
Data at 80 98C5h = 492h 
LUF LV UF N Z V C 

After Instruction: 

DP = 80h 

R1 = 0C9h = 201 

Data at 80 98C5h = 492h 

LUF LV UF N Z V C 

Example SUBC 3000, R0 (3000 

Before instruction: 

R0 = 07D0h = 2000 
LUF LV UF N Z V C 

After Instruction: 

R0 = OFAOh = 4000 
LUF LV UF N Z V C 



1170 

0 0 0 0 0 0 0 



1170 

0 0 0 0 0 0 0 

0BB8h) 

0 0 0 0 0 0 0 
0 0 0 0 0 0 0 
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Assembly Language Instructions 



Subtract Floating-Point Value SUBF 



Syntax 

Operation 

Operands 



SUBF src, dst 
dst- src -> dst 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

ofsfregister(RO-R11) 



Encoding 



31 


24 23 




16 15 


87 




0 




— i — i — n — i — 


1 


l l l l 


i i i i 


-i — i — i — i— i r 


■"■"i — i — r™ 




0 0 0 


1 0 1 111 


G 


dst 




src 







Description The result of the dst operand minus the src operand is loaded into the 
dst register. The dst and src operands are assumed to be floating-point 
numbers. 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if an floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an floating-point overflow occurs, 0 otherwise. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
Example subf *aro— ( iro ) , R5 
Before Instruction: 

AR0 = 80 9888h 
IRO = 80h 

R5 = 07 33C0 OOOOh = 1 .79750000e + 02 
Data at 80 9888h = 70C 8000h = 1 ,4050e + 02 
LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

AR0 = 80 9808h 
IRO = 80h 

R5 = 05 1 D00 OOOOh = 3.9250e + 01 

Data at 80 9888h = 70C 8000h = 1 .4050e + 02 

LUFLV UF N Z V C = 0 0 0 0 0 00 



11-219 



SUBF3 Subtract Floating-Point, 3 Operands 



Syntax SUBF3 src2, srrt, dst 
Operation S rc1 - $rc2 -> dst 

Operands srd, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (RO - R1 1 ) 

Encoding 
Typel 

31 24 23 16 15 87 0 





— 1" 


— 1 — 1 — | — r - 


i i i i i i i 


- i — i— i — i — r 


1 — I 


0 0 1 0 0 1 1 0 1 


T 


dst 


srd 


src2 





Type 2 

31 



24 23 



16 15 



87 



" i n — i — i — t — i — i — 


— 1 — 


— 1 — 1 — 1 — 1 — 


i i i i i i i 


— I — I — I — 1 — T" 


t — r— 


0 0 1 10 110 1 


T 


dst 


srd 


src2 





Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (R0 — R1 1 ) 


register mode (RO — R1 1 ) 


01 


indirect mode (disp - 0, 1 , IRO, IR1) 


register mode (RO — R1 1 ) 


10 


register mode (RO — R11) 


indirect mode (disp « 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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Assembly Language Instructions 



Subtract Floating-Point Value, 3 Operands SU BF3 



Description The difference of the srd and src2 operands is loaded into the dst register. 

The srd, src2, and dst operands are assumed to be floating-point numbers. 

Cycles 1 



Status Bits LUF 1 



LV 

UF 

N 

Z 

V 

c 



f a floating-point underflow occurs, unchanged otherwise. 

if an floating-point overflow occurs, unchanged otherwise. 

if a floating-point underflow occurs, 0 otherwise. 

if a negative result is generated, 0 otherwise. 

if a zero result is generated, 0 otherwise. 

if an floating-point overflow occurs, 0 otherwise. 



Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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11 



SUBF3||STF Parallel SUBF3 and STF 



Syntax 

Operation 

Operands 



SUBF3 srd, src2, dsti 
|| STF src3, dst2 

src2- srd -> dsti 
|| src3-*dst2 

srd register (RO - R7) 

src2 indirect (disp = 0, 1 , IRO, IR1 ) 

dst 1 register (RO - R7) 

src3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1) 



Encoding 

31 



24 23 



16 15 



87 



1 — 




1 1 1 


1 1 


, "" 1 "T — 


i i 


- "| j r i inr r 


i ■"!" r — i — i~i i " 


1 1 


1 


0 10 1 


dsti 


srd 


src3 


dst2 


src2 



Description A floating-point subtraction and a floating-point store are performed in paral- 
lel. All registers are read at the beginning and loaded at the end of the ex- 
ecute cycle. This means that if one of the parallel operations (STF) reads 
from a register and the operation being performed in parallel (SUBF3) writes 
to the same register, then STF accepts as input the contents of the register 
before it is modified by the SUBF3. 

If src3 and dsti point to the same location, src3 is read before the write to 
dsti. 

Cycles 1 

Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 

LV 1 if an floating-point overflow occurs, unchanged otherwise. 

UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an floating-point overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Assembly Language Instructions 



Parallel SUBF3 and STF SUBF3)|STF 



Example subf 3 ri , * -AR4 ( iri ) , ro 

I | STF R7, *+AR5 (IRO) 

Before Instruction: 

R1 - 05 7B40 OOOOh = 6.281 25e + 01 
AR4 = 80 98B8h 
IR1 = 8h 
RO = Oh 

R7 = 07 33C0 OOOOh - 1 .79750e + 02 
AR5 = 80 9850h 
IR0 = 10h 

Data at 80 98B0h = 70C 8000h = 1 .4050e + 02 
Data at 80 9860h = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 

R1 = 05 7B40 OOOOh = 6.281 25e + 01 
AR4 = 8098B8h 
IR1 =8h 

RO = 06 1 B60 OOOOh = 7.768750e + 01 
R7 = 07 33C0 OOOOh = 1 .79750e + 02 
AR5 = 80 9850h 
IRO = 1 Oh 

Data at 80 98B0h = 70C 8000h = 1 .4050e + 02 
Data at 80 9860h = 733 COOOh = 1 .79750e + 02 
LUFLV UF N Z V C = 0 0 0 0 0 00 
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SUBI Subtract Integer 



Syntax SUBI src, dst 
Operation dst - src -> dst 

Operands src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 

Encoding 



31 


24 23 




16 15 




87 


0 


- r~ i — 


- n — r"i — i — 


— 


I l l l 


1 1 I 


i — i — r— 


i — m — r 




0 0 0 


1 1 0 0 0 0 


G 


dst 






src 





Description The difference of the dst operand minus the src operand is loaded into the 
dst register. The dsf and src operands are assumed to be signed integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a borrow occurs, 0 otherwise 

OVM Operation is affected by OVM bit value. 

SUBI 220, R7 

Before Instruction: 

R7 = 226h = 550 

LUFLV UF N Z V C = 0 0 0 0 0 0 0 
After Instruction: 

R7 = 14Ah = 330 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 



Mode Bit 
Example 
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Assembly Language Instructions 



^ Subtract Integer, 3 Operands SUBI3 

Synfax SUBI3 src2, $rc1, dst 
Operation S rd - src2 -> cfef 

Operands srd, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 



Typel 



■ v—m — i — i — i — i i 

0 0 1 0 0 1 1 1 0 


— r- 

T 


— \ — 1 — 1 — 1 — 

dst 


—I — I — 1 — 1 — 1 — 1 — 1 — 

srd 


— 1 — 1 — 1 — 1 — 1 — \ — 1— 

src2 


Type 2 

31 24 23 16 15 8 7 0 


1 III 1 1 1 1 

0 0 1 10 1110 


T 


I 1 I I 

dst 


i i i i i i i 

srd 


i i i i i i i 

src2 



Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1 , IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1 ) 


indirect mode (disp = 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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SUBI3 Subtract Integer, 3 Operands 

Description The result of the srd operand minus the $rc2 operand is loaded into the dst 
register. The srd, src2, and dst operands are assumed to be signed inte- 
gers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a borrow is generated, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 



11-226 



Assembly Language Instructions 



Parallel SUBI3 and STI SUBI3 1 |STI 



Syntax 



Operation 



Operands 



SUBI3 srrt, src2, dst1 
STI $rc3, dst2 



src2-src1 - 
src3 -» dst2 



dst1 



$rd 
$rc2 
d$t1 
src3 
dst2 



register (RO - R7) 

indirect (disp - 0, 1, IRO, IR1) 

register (RO - R7) 

register (RO - R7) 

indirect (disp = 0, 1, IRO, IR1) 



Encoding 

31 



24 23 



16 15 



87 



— 1 — 


1 I 1 T 


"T i— 


— i— r 


i i 


i i i i i i i 


— 1 — 1 — I - T"l — T"! " 


1 1 


10 11 0 


d$t1 


srd 


src3 


dst2 


src2 



Description An integer subtraction and an integer store are performed in parallel. All reg- 
isters are read at the beginning and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (SUBI3) writes to the same 
register, then STI accepts as input the contents of the register before it is 
modified by the SUBI3. 



Cycles 
Status Bits 



If src3 and dst1 point to the same location, $rc3 is read before the write to 
dstl. 

1 

LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise. 



Mode Bit OVM Operation is affected by OVM bit value. 



11 
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SUBI3||STI Parallel SUBI3 and STI 



Example SUBI3 R7, *+AR2 (IRO) ,R1 

I I STI R3, *++AR7 

Pefore Instruction; 

R7 = 14h = 20 

AR2 - 80 982Fh 

IRO = 10h 

R1=0h 

R3 = 35h = 53 

AR7 = 80 983Bh 

Data at 80 983Fh = ODCh = 220 

Data at 80 983Ch = Oh 

LUFLV UF N Z V C = 0 0 0 0 0 0 0 

After Instruction; 

R7 = 14h = 20 

AR2 - 80 982Fh 

IRO = 1 Oh 

R1 = 0C8h = 200 

R3 = 35h = 53 

AR7 = 80 983Ch 

Data at 80 983Fh = ODCh = 220 

Data at 80 983Ch = 35h = 53 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Assembly Language Instructions 



Subtract Reverse Integer With Borrow SUBRB 



Syntax 

Operation 

Operands 



SUBRB src, dst 
src - dst -C -> dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 


87 




0 


"1 — r— 


— i "i "T — i — r— 


— I — 


1 l 1 1 


i i i I 


—j — i — i — i — i — r 


-T — 1 — 1— 


1 1 


0 0 0 


1 1 0 0 0 1 


G 


dst 




src 







Description The difference of the src, dst, and C operands, as calculated above, is 
loaded into the dst register. The dst and src operands are assumed to be 
signed integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is R0 - R11 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a borrow occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 

Example subrb r4,r6 

Before Instruction: 



R4 = 03CBh = 971 
R6 = 0258h = 600 
LUF LV UF N Z 

After Instruction: 

R4 = 03CBh = 971 
R6 = 0172h = 370 
LUF LV UF N Z 



V C = 0 0 00001 



VC=0 0 00 0 00 
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SUBRF Subtract Reverse Floating-Point Values 



Syntax 

Operation 

Operands 



SUBRF src, dst 
src- dst -> dst 

src general addressing modes (G): 

0 0 register (R0-R11) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (R0-R11) 



Encoding 



31 


24 23 




16 15 


87 




0 


— 1 — 1 


— r-i — i — i — i — 




l l l l 


l I I I 


—i r™i i i i 


I 1 1 


I 1 


0 0 0 


110 0 10 


Q 


dst 




src 







Description The result of the src operand minus the dst operand is loaded into the dst 
register.The dst and src operands are assumed to be floating-point num- 
bers. 



Cycles 
Status Bits 



1 

LUF 

LV 

UF 

N 

Z 

V 

c 



if a floating-point underflow occurs, unchanged otherwise. 

if a floating-point overflow occurs, unchanged otherwise. 

if a floating-point underflow occurs, 0 otherwise. 

if a negative result is generated, 0 otherwise. 

if a zero result is generated, 0 otherwise. 

if a floating-point overflow occurs, 0 otherwise. 



Mode Bit 
Example 



Unaffected. 

OVM Operation is not affected by OVM bit value. 

SUBRF @9905h,R5 

Before instruction: 

DP = 80h 

R5 - 05 7B40 OOOOh = 6.281 250e + 01 

Data at 80 9905h = 733 COOOh - 1 .79750e + 02 

LUFLV UF N Z V C-0 0 0 0 0 0 0 

After Instruction: 

DP = 80h 

R5 = 06 69E0 OOOOh = 1 .1 6937500e + 02 
Data at 80 9905h = 733 COOOh = 1 .79750e + 02 
LUF LV UF NZVC = 0 0 0 0 00 0 
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Assembly Language Instructions 



Subtract Reverse Integer SU B Rl 



Syntax 

Operation 

Operands 



SUBRI src, dst 
src- dst dst 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 






87 




0 




i i i i i 


1 


i i 1 I 


i i 


"1 1 — 1 


1 1 


I I I I 


■i "T i r 




0 0 0 


110 0 11 


G 


dst 








src 







Description The result of the src operand minus the dst operand is loaded into the dst 
register. The dst and src operands are assumed to be signed integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 if a zero result is generated, 0 otherwise. 
V 1 if an integer overflow occurs, 0 otherwise. 
C 1 if a borrow occurs, 0 otherwise. 

Mode Bit OVM Operation is affected by OVM bit value. 
Example subri *ar5++ ( iro ) , R3 
Before Instruction: 

AR5 - 80 9900h 
IRO = 8h 

R3 = ODCh = 220 

Data at 80 9900h = 226h = 550 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 

After Instruction: 

AR5 = 80 9908h 
IRO - 8h 

R3 = 014Ah = 330 

Data at 80 9900h = 226h . 550 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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SWI Software Interrupt 



Syntax SWI 

Operation Performs an emulation interrupt 
Operands None 
Encoding 

31 24 23 16 15 87_ 



t — i — i — i — i — r 

0 1 10 0 1 1 



i — r 

0 0 0 0 



t — i — r — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — r- 

0000000 00000000000000 



Description The SWI instruction performs an emulator interrupt. This is a reserved in- 
struction and should not be used in normal programming. 

Cycles 4 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 

Mode Bit OVM Operation is not affected by OVM bit value. 
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Convert to IEEE Format TOIEEE 



Syntax 

Operation 

Operands 

Encoding 



TOIEEE src, dst 
convert src to IEEE format 
src 



dst 



extended-precision register (RO - R11), 
direct and indirect addressing modes 
dst extended-precision register 



31 


24 23 




16 15 




87 




0 




™T — I — 1 — 


— r 


i i i i 


i i 


i i — rr 


"T™"T i — r 


— I— I— j 


l"TT" 


0 0 0 1 1 


0 111 


G 


dst 






src 







Instruction Word Fields 



G 


src addressing modes 


00 


register mode (extended-preci- 
sion register R0 - R1 1 ) 


01 


direct mode 


10 


indirect mode 



Description The src operand is converted from a twos-complement floating-point format 
to the IEEE floating-point format. 

The srcoperand is assumed to be a single-precision floating-point number. 
The converted result goes into the 32 MSBs of the dst register. STFcanbe 
used to store the result to memory. 

Cycles 1 

Status Bits LUF Unaffected. 

LV 1 if an overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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TOIEEEj |STF Parallel TOIEEE and STF 



Syntax 



Operation 



Operands 



Encoding 

31 



TOIEEE src2, dst1 
|| STF src3, dst2 

convert $rc2 to IEEE format 
in parallel with 
src3-> dst2 



dst1 



$rc2 
dst1 
$rc3 
dst2 



indirect mode (disp = 0, 1, IRO, IR1) 
register mode (Rn1 , 0 < n1 < 7) 
register mode (Rn1 , 0 < n1 < 7) 
indirect mode (disp = 0, 1 , IRO, IR1) 



24 23 



16 15 



87 



r 'i i — i — r 


r'-i 


— I — i — 


—i — r- 


i i """P"" i 'i i r 


1 1 — 1 — 1 — f — I — T" 


111 10 0 0 


dst1 


0 0 0 


src3 


dst2 


src2 



Description The src2 operand is converted from a twos-complement floating-point for- 
mat to the IEEE floating-point format. 

The src2 operand is assumed to be a single-precision floating-point number. 
The converted result goes into the 32 MSBs of the dst1 register. In parallel 
a floating-point store is done. 

If $rc2 and d$t2 point to the same location, then src2 is read before the write 
to dst2. 

Cycles 1 

Status Bits LUF Unaffected. 

LV 1 if an overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if a zero result is generated, 0 otherwise. 

V 1 if an overflow occurs, 0 otherwise. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Trap Conditionally TRAPcond 



Syntax 
Operation 



Operands 
Encoding 



TRAPcond N 

If (cond is true) 
ST(GIE) -> ST(PGIE) 
ST(CF) -> ST(PCF) 

0 -> ST(GIE) 

1 ST(CF) 

next PC -> *(++SP) 
trap vector N -> PC 
Else, continue. 

N immediate mode (0<N<511) 



31 


24 23 




16 15 


87 




0 


i 




1 


l l l l 


i § i i i i 




7—1 — r"i 


1 — T"7 


0 1 


110 10 0 0 


0 0 


cond 


0 0 0 0 0 0 0 




N 





Description if traps are to be nested, you may need to save the status register before 
executing TRAPcond. If the condition is true, then GIE and CF are saved 
in PGIE and PCF in the status register, all interrups are disabled (0 -» GIE), 
and the cache is frozen (1 -> CF). Then, the contents of the PC are pushed 
onto the system stack, and the PC is loaded with the contents of the speci- 
fied trap vector (N). If the condition is not true, then continue normal opera- 
tion. 



Cycles 
Status Bits 



GIE Set to 0 if TRAP executes. 

LUF Unaffected. 

LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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TSTB Test Bit Fields 



Syntax 

Operation 

Operands 



TSTB src, dst 
dst AND src 

src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 



Encoding 



31 


24 23 




16 15 


87 


0 


—1 — 1 — 


i i i i i 




l l l 1 




-1 — 1 — 1 — 1 — 11 1 


t — i — i — r- 


0 0 0 


110 10 0 


G 


dst 




src 





Description The bitwise logical AND of the dst and srcoperands is formed, but the result 
is not loaded in any register. This allows for nondestructive compares. The 
dst and src operands are assumed to be unsigned integers. 



Cycles 



1 



Status Bits These condition flags are modified for all destination registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example tstb *-AR4(i),R5 

Before Instruction: 

AR4 = 80 99C5h 

R5 = 898h = 2200 

Data at 80 99C4h = 767h . 1895 

LUF LV UF N Z V C = 0 0 0 0 0 0 0 

After Instruction: 

AR4 = 80 99C5h 

R5 - 898h - 2200 

Data at 80 99C4h . 767h = 1895 

LUF LV UF N Z V C = 0 0 0 0 1 0 0 
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Test Bit Fields, 3 Operands TSTB3 



Syntax TSTB3 src2, srd 

Operation srd & src2 

Operands srcl, src2 both type 1 or type 2 three-operand addressing modes 
Encoding 



Typel 

31 



24 23 



16 15 



87 



— 1 — 1 — 1 — 1 — 1 — 1 — 1 — 1 — 

0 0 1 0 0 1 1 1 1 


T 


— i — r — i — i — 

0 0 0 0 0 


1 1 1 1 1 1 1 

srd 


I 1 1 I I I I 

src2 


Type 2 

31 24 23 16 15 8 7 0 


ill l l l l l 

0 0 1 10 1111 


i 

T 


1 1 1 1 

0 0 0 0 0 


I I I I I 1 I 

srd 


i i i i i i I 

src2 



Instruction Word Fields 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp = 0, 1 , IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1 ) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1, IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 



Type 1 



Type 2 
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Description The bitwise logical AND between the srd and src2 operands is formed but 
is not loaded into any register. This allows for nondestructive compares. The 
srd and src2 operands are assumed to be signed integers. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
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Bitwise Exclusive OR XOR 



Syntax XOR src, dst 
Operation dst XOR src -» dst 

Operands src general addressing modes (G): 

0 0 register (any register in CPU primary register file) 

0 1 direct 

1 0 indirect 
1 1 immediate 

dst register (any register in CPU primary register file) 

Encoding 



31 


24 23 




16 15 


87 


0 


— 1 1— 

0 0 0 


— 1 — | — j — i — | — 

110 10 1 


G 


I I I I 

dst 


i i i i i i ■ ■ ■ i i i i i i 

src 



Description The bitwise exclusive OR of the src and dst operands is loaded into the dst 
register. The dst and src operands are assumed to be unsigned integers. 

Cycles 1 

Status Bits if ST (SETCOND) = 0 and the destination register is R0 - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example xor Rl,R2 

Before Instruction: 

R1 = OF FA32h 
R2 = 0F F5C1h 

LUFLV UF N Z V C = 0 0 0 0 0 00 

After Instruction: 

R1 =0FF412h 
R2 . 00 0FF3h 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Syntax X0R3 src2, srd , dst 
Operation S rc1 XOR src2 -» dst 

Operands srd, src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 

Encoding 
Type 1 



31 


24 23 




16 15 




87 




0 


I I 

0 0 1 


1 1 1 1 1 

0 0 0 0 0 0 


1 

T 


I I I I 

dst 


I I i I I I i 

srd 


1 1 1 1 1 1 1 

src2 



Type 2 

31 24 23 16 15 87 0 



m — i r r i — i — — r — i 


-r— 


' T 1"" 1 "1 


1 I I I I 1 1 


— r i i i i 


1 1 


0 0 1 1 1 0 0 0 0 


T 


dst 


srd 


src2 





Instruction Word Fields 



Type 2 



T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


register mode (any CPU register) 


01 


indirect mode (disp - 0, 1 , IR0, IR1) 


register mode (any CPU register) 


10 


register mode (any CPU register) 


indirect mode (disp = 0, 1 , IRO, IR1) 


11 


indirect mode (disp = 0, 1 , IRO, IR1) 


indirect mode (disp = 0, 1 , IRO, IR1) 




T 


srd addressing modes 


src2 addressing modes 


00 


register mode (any CPU register) 


8-bit signed immediate 


01 


register mode (any CPU register) 


indirect mode *+ARn(5-bit unsigned 
displacement) 


10 


indirect mode *+ARn(5-bit unsigned 
displacement) 


8-bit signed immediate 


11 


indirect mode *+ARn1 (5-bit unsigned 
displacement) 


indirect mode *+ARn2(5-bit unsigned 
displacement) 
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Bitwise Exclusive OR, 3 Operands X0R3 



Description The bitwise exclusive OR between the $rd and src2operands is loaded into 
the dst register. The srd, src2, and dst operands are assumed to be signed 
integers. 

Cycles 1 

Status Bits If ST (SETCOND) = 0 and the destination register is RO - R1 1 , the condition 
flags are modified. If ST (SETCOND) = 1 , they are modified for all destina- 
tion registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 
V 0. 

C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 



11 
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X0R3||STI Parallel XOR3 and STI 



Syntax 

Operation 

Operands 



X0R3 src2, srrt, dst1 
STI $rc3, dst2 



srd XOR $rc2- 
$rc3 -> dst2 



dst1 



srd register (RO - R7) 

$rc2 indirect (disp = 0, 1 , IRO, IR1) 

d$t1 register (RO - R7) 

$rc3 register (RO - R7) 

dst2 indirect (disp = 0, 1 , IRO, IR1) 



Encoding 

31 



24 23 



16 15 



87 



1 


— 1 — 1 — 1 — 1 — 








"" , mr i IM ™i i i i 


— 1 — 1 — 1 — 1 — 1™ 1 — T" 


1 1 


10 111 


dst 


src\ 


src3 


dst2 


src2 



Description A bitwise exclusive-XOR and an integer store are performed in parallel. All 
registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STI) reads from a 
register and the operation being performed in parallel (XOR3) writes to the 
same register, then STI accepts as input the contents of the register before 
it is modified by the XOR3. 

If src2 and dst2 point to the same location, $rc2 is read before the write to 
d$t2. 

Cycles 1 

Status Bits LUF Unaffected. 

LV Unaffected. 

UF 0. 

N MSB of the output. 

Z 1 if a zero output is generated, 0 otherwise. 

V 0. 

C Unaffected. 



Mode Bit OVM Operation is not affected by OVM bit value. 
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Parallel XOR3 and STI X0R3| |STI 



Example xor3 *ari++,r3,R3 
I I 

STI R6, *-AR2 (IRO) 

Before Instruction: 

AR1 . 80 987Eh 
R3 . 85h 

R6 = ODCh = 220 
AR2 = 80 98B4h 
IRO = 8h 

Data at 80 987Eh = 85h 
Data at 80 98ACh = Oh 

LUF LV UF N Z V C = 0 0 0 0 0 00 
After Instruction: 

AR1 = 80 987Fh 
R3 = Oh 

R6 = ODCh = 220 
AR2 = 80 98B4h 
IRO = 8h 

Data at 80 987Eh = 85h 

Data at 80 98ACh = ODCh = 220 

LUFLV UF N Z V C = 0 0 0 0 0 00 
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Chapter 12 



Software Applications 




This chapter explains how to use the instruction set, the architecture, and 
the interface of the 'C40. It presents coding examples for frequently used 
applications and discusses more involved examples and applications. It 
also defines the principles involved in the application and gives the corre- 
sponding assembly-language code for instructional purposes and for imme- 
diate use. Whenever the detailed explanation of the underlying theory is too 
extensive to be included in this manual, appropriate references are given 



for further information. 

Major topics discussed in this chapter are listed below: 

Section Page 

12.1 Processor Initialization 12-3 

■ Reset Process 12-3 

■ Initialization 12-3 

12.2 Program Control 12-9 

■ Regular and Zero-Overhead Subroutine Calls 12-9 

■ Software Stack 12-13 

■ Interrupt Service Routines 12-14 

■ Delayed Branches 12-22 

■ Repeat Modes 1 2-23 

■ Computed GOTOs 12-27 

12.3 Logical and Arithmetic Operations 12-28 

■ Bit Manipulation 12-28 

■ Block Moves 12-29 

■ Byte and Half Word Manipulation 12-30 

■ Bit Reversed Addressing .12-31 
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Section Page 

■ Division 12-33 

■ Square Root 12-38 

■ Extended-Precision Arithmetic 1 2-41 

■ IEEE <==> C40 Floating-Point Conversions 12-42 

12.4 Application-Oriented Operations 12-46 

■ Companding (A-law/ji-law) 12-46 

■ FIR/IIR Filters (fixed and adaptive) 12-51 

■ Matrix Math 12-61 

■ FFT 12-63 

■ Lattice Filters 12-88 

1 2.5 Programming Tips 1 2-94 

■ C-Callable Routines 12-94 

■ Hints for Assembly Coding . . . .... 12-95 

12.6 Peripherals 12-97 

■ Timer 12-97 

■ Communication Port 12-98 

■ Direct Memory Access 12-101 
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Processor Initialization 

12.1 Processor Initialization 

12.1.1 Reset Process 

Before you execute a DSP algorithm, it is necessary to initialize the proces- 
sor. Generally, initialization takes place any time the processor is reset. 

When reset is activated by applying a low level to the RESET input for sever- 
al cycles, the 'C40 terminates execution and puts the RESET vector in the 
program counter. The RESET vector of 'C40 may be mapped to one of four 
different locations, controlled by the value of the RESETLOC(1 ,0) pins at 
RESET (shown in Table 12-1). The RESET vector normally contains the 
address of the system initialization routine. The hardware reset also initial- 
izes various registers and status bits (reset conditions are further defined 
in Section 6.6 on page 6-18). 

Table 12- 1. Relationship of RESETLOC(1, 0) Pins to RESET Vector Location 



RESETLOC(1,0) 


RESET Vector Address 


00 


0000 OOOOh 


0 1 


7FFF FFFFh 


1 0 


8000 OOOOh 


1 1 


FFFF FFFFh 



After reset, initialize the processor further by executing instructions that set 
up operational modes, memory pointers, interrupts, and the remaining 
functions needed to meet system requirements. 

12.1.2 Initialization 

To configure the processor at reset, the following internal functions should 
be initialized: 

□ CPU expansion register file 

□ Memory-mapped registers 

□ Interrupt structure 
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Processor Initialization 



Example 12-1 shows coding for initializing the 'C40 to the following ma- 
chine state, in addition to the initialization performed during the hardware 
reset (for conditions after hardware reset, see Section 6.6 on page 6-18 ): 

□ All interrupts are enabled. 

□ The program cache is enabled. 

□ The overflow mode is disabled. 

□ The data memory page pointer is set to zero. 

□ The stack pointer is set to internal RAM address 002FFF00H 

□ The internal memory is filled with zeros. 

Note that all constants larger than 1 6 bits should be placed in memory and 
accessed through direct or indirect addressing. 

Example 12-1. Processor Initialization Example 
* 

* TITLE 'PROCESSOR INITIALIZATION EXAMPLE' 

.global RESET, INIT, BEGIN 

.global TIMEO , TIME1 , TINTO , TINT1 

. global NMI , INTO , INT1 , INT2 , INT3 

.global NON__MASK, ISRO, ISR1 , ISR2 , ISR3 

. global ICFULLO , ICRDYO , OCRDYO , OCEMPTYO 

. global ICFSRO , ICRSRO , OCRSRO , OCESRO 

. global ICFULL1 , ICRDY1 , OCRDY1 , OCEMPTY1 

. global ICFSR1 , ICRSR1 , OCRSR1 , OCESR1 

. global ICFULL2 , ICRDY2 , OCRDY2 , OCEMPTY2 

. global ICFSR2 , ICRSR2 , OCRSR2 , OCESR2 

. global ICFULL3 , ICRDY3 , OCRDY3 , OCEMPTY3 

. global ICFSR3 , ICRSR3 , OCRSR3 , OCESR3 

. global ICFULL4 , I CRD Y 4 , OCRDY4 , OCEMPTY4 

. global ICFSR4 , ICRSR4 , OCRSR4 , OCESR4 

. global ICFULL5 , I CRD Y 5 , OCRDY5 , OCEMPTY5 

. global ICFSR5 , ICRSR5 , OCRSR5 , OCESR5 

.global DINTO, DINT1, DINT2, DINT3, DINT4, DINT5 

. global DMAO , DMA1 , DMA2 , DMA3 , DMA4 , DMA5 

. global TRAP 0 , TRAP 1 , TRAP 2 , TRAP 3 , TRAP 4 , TRAP 5 

. global TRP 0 , TRP 1 , TRP 2 , TRP 3 , TRP 4 , TRP 5 
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Example 12-1. Processor Initialization Example (Continued) 



* 
* 
★ 

* 

* 
* 
★ 

★ 

RESET 



NMI 



PROCESSOR INITIALIZATION FOR THE TMS320C40. 

RESET AND INTERRUPT VECTOR SPECIFICATION. THIS ARRANGEMENT 
ASSUMES THAT DURING LINKING, THE FOLLOWING TEXT SEGMENT 
WILL BE PLACED TO START AT MEMORY LOCATION AS: 



SEGMENT NAME 



MEMORY LOCATION 



reset_adr 


I SAME AS RESETLOC (1/0) SETUP 


int_yect 


I 00000200h 


trap_vect 


I 00000400h 


. data 


I 00000500h 


.text 


1 00000600h 



NOTE THAT THE INTERRUPT AND TRAP VECTORS TABLE CAN BE 
RELOCATED TO A 512-WORD BOUNDARY BY CHANGING THE VALUES 
OF THE IVTP AND TVTP . 



.sect "reset_adr" 
.word INIT 

.sect "int vect" 



. space 1 
.word NON MASK 



Named section for RESET vector 
RS-load address INIT to PC 

Named section for interrupt 

structures 

Reserved space 

Non Maskable Interrupt NMI-loads 
address NMI to PC 



TINTO 


.word 


TIMEO 


; Timer 


0 interrupt processing 


INTO 


.word 


ISRO 


/ INT0- 


loads address INTO to PC 


INT1 


.word 


ISR1 


; INT1- 


loads address INT1 to PC 


INT2 


.word 


ISR2 


; INT2- 


loads address INT2 to PC 


INT3 


.word 


ISR3 


; INT3- 


loads address INT3 to PC 


* 


. space 


6 


; Reserved space 


ICFULLO 


.word 


ICFSRO 


; Comm. 


port 


0 


input full processing 


ICRDYO 


.word 


ICRSRO 


; Comm. 


port 


0 


input ready processing 


OCRDY0 


.word 


OCRSR0 


; Comm. 


port 


0 


output ready processing 


OCEMPTY0 
* 


.word 


OCESR0 


/ Comm. 


port 


0 


output empty processing 


ICFULL1 


.word 


ICFSR1 


; Comm. 


port 


1 


input full processing 


ICRDY1 


.word 


ICRSR1 


; Comm. 


port 


1 


input ready processing 


OCRDY1 


.word 


OCRSR1 


; Comm. 


port 


1 


output ready processing 


OCEMPTY1 


.word 


OCESR1 


; Comm. 


port 


1 


output empty processing 


ICFULL2 


.word 


ICFSR2 


; Comm. 


port 


2 


input full processing 


ICRDY2 


.word 


ICRSR2 


; Comm. 


port 


2 


input ready processing 


OCRDY2 


.word 


OCRSR2 


; Comm. 


port 


2 


output ready processing 


OCEMPTY2 


.word 


OCESR2 


; Comm. 


port 


2 


output empty processing 


ICFULL3 


.word 


ICFSR3 


; Comm. 


port 


3 


input full processing 


I CRD Y 3 


.word 


ICRSR3 


; Comm. 


port 


3 


input ready processing 


OCRDY3 


.word 


OCRSR3 


; Comm. 


port 


3 


output ready processing 


OCEMPTY3 


.word 


OCESR3 


; Comm. 


port 


3 


output empty processing 
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Example 12-1. Processor Initialization Example (Continued) 



ICFULL4 
I CRD Y 4 
0CRDY4 
0CEMPTY4 

ICFULL5 
I CRD Y 5 
0CRDY5 
0CEMPTY5 

DINTO 
DINT1 
DINT2 
DINT3 
DINT4 
DINT 5 
TINT1 



TRAPO 
TRAP1 
TRAP 2 
TRAP 3 
TRAP 4 
TRAP 5 



.word ICFSR4 

.word ICRSR4 

.word OCRSR4 

.word OCESR4 

.word ICFSR5 

.word ICRSR5 

.word OCRSR5 

.word OCESR5 



.word 
.word 
.word 
.word 
.word 
.word 
.word 
.space 



DMAO 
DMA1 
DMA2 
DMA3 
DMA4 
DMA5 
TIME1 
21 



Comm. port 4 input full processing 

Comm. port 4 input ready processing 

Comm. port 4 output ready processing 

Comm. port 4 output empty processing 

Comm. port 5 input full processing 

Comm. port 5 input ready processing 

Comm. port 5 output ready processing 

Comm. port 5 output empty processing 

DMA Channel 0 interrupt 
DMA Channel 1 interrupt 
DMA Channel 2 interrupt 
DMA Channel 3 interrupt 
DMA Channel 4 interrupt 
DMA Channel 5 interrupt 
Timer 1 interrupt processing 
Reserved space 



.sect " t r ap__ve ct " 

.word TRPO 

.word TRP1 

.word TRP2 

.word TRP3 

.word TRP4 

.word TRP5 
.space 506 



Named 

Trap 

Trap 

Trap 

Trap 

Trap 

Trap 

Leave 



section for trap structures 
vector processing begins 
vector processing begins 
vector processing begins 
vector processing begins 
vector processing begins 
vector processing begins 

space for the other 506 traps 



IN THIS SECTION, CONSTANTS THAT CANNOT BE REPRESENTED 
IN THE SHORT FORMAT ARE INITIALIZED. 



Beginning address of RAM block 0 
Beginning address of RAM block 1 
Pointer for peripheral-bus memory map 
Init of global memory interface 
control (0) 

Init of local memory interface 
control (4) 





.data 




MASK 


.word 


OFFFFFFFF.H 


BLKO 


.word 


02FF800H 


BLK1 


.word 


02FFC00H 


CTRL 


.word 


0100000H 


GLOINT 


.word 


0000000H 


LOCALINT 


.word 


0000000H 


DMAOCTL 


.word 


0000000H 


DMA1CTL 


.word 


0000000H 


DMA2CTL 


.word 


0000000H 


DMA3CTL 


.word 


0000000H 


DMA4CTL 


.word 


0000000H 


DMA5CTL 


.word 


0000000H 


CPOCTL 


.word 


0000000H 


CP1CTL 


.word 


000000OH 


CP2CTL 


.word 


0000000H 


CP3CTL 


.word 


0000000H 


CP4CTL 


.word 


0000000H 


CP5CTL 


.word 


0000000H 


TIMOCTL 


.word 


0000000H 


TIM1CTL 


.word 


0000000H 



Initialization for DMA 0 
Initialization for DMA 1 
Initialization for DMA 2 
Initialization for DMA 3 
Initialization for DMA 4 
Initialization for DMA 5 
Init of comm. port 0 
Init of comm. port 1 
Init of comm. port 2 
Init of comm. port 3 
Init of comm. port 4 
Init of comm. port 5 
Initialization of timer 
Initialization of timer 



control 
control 
control 
control 
control 
control 

(64) 
(80) 
(96) 
(112) 



(160) 
(176) 
(192) 
(208) 
(224) 
(240) 



control 
control 
control 
control 
control (128) 
control (144) 

0 control 

1 control 



(32) 
(48) 
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Example 12-1. Processor Initialization Example (Continued) 



ANACTL 

STCK 

* 



. word 
. word 
.text 



0000000H ; Initialization of analysis module (16) 

02FFF00H ; Beginningof stack 



THE ADDRESS AT RESET VECTOR DIRECTS EXECUTION TO BEGIN HERE 
FOR RESET PROCESSING THAT INITIALIZES THE PROCESSOR. WHEN 
RESET IS APPLIED, THE FOLLOWING REGISTERS ARE INITIALIZED 
TO ZERO: 



ST 
DIE 
HE 
IIF 



CPU STATUS REGISTER 
DMA INTERRUPT ENABLE REGISTER 
INTERNAL INTERRUPT ENABLE REGISTER 
IIOF PINS AND INTERRUPT FLAG REGISTER 



IVTP 
TVTP 



INTERRUPT-VECTOR TABLE POINTER 
TRAP-VECTOR TABLE POINTER 



* 

* 

* 
* 
* 

INIT 



THE STATUS REGISTER HAS THE FOLLOWING ARRANGEMENT: 



BITS: 31-17 16 15 14 13 

FUNCTION: RESRV ANALYSIS SET PGIE GIE 

IDLE COND 

BITS: 9 8 7 6 5 

FUNCTION: PCF RM OVM LUF " LV 



12 

CC 

4 

UF 



11 
CE 

3 
N 



10 
CF 

2 
Z 



1 
V 



LDPX 
LDI 



MASK 
1800H, ST 



Point the DP register to page 0 
Clear and enable cache, and 
disable OVM 



SET UP IVTP AND TVTP TO 20 OH AND 40 OH 



LDI 
LDPE 
ADD I 
LDPE 
LDI 



0200H, ARO 
ARO , IVTP 
0200H, ARO 
ARO , TVTP 
@MASK, IE 



Set Primary Register ARO to 200H 

Set Expansion Register IVTP to 200H 

Set Primary Register ARO to 400H 

Set Expansion Register TVTP to 400H 
Enable all interrupts 



INTERNAL DATA MEMORY INITIALIZATION TO FLOATING POINT ZERO 



I I 



LDI @BLK0 , ARO 

LDI @BLK1 , AR1 

LDF 0.0, R0 

RPTS 1023 

STF R0,*AR0++(1) 

STF R0,*AR1++(1) 



ARO points to block 0 

AR1 points to block 1 

Zero register R0 

Repeat 1024 times . . . 

Zero out location in RAM block 0 

Zero out location in RAM block 1, 



and 



THE PROCESSOR IS INITIALIZED. THE REMAINING APPLICATION- 
DEPENDENT PART OF THE SYSTEM (BOTH ON- AND OFF-CHIP SHOULD 
NOW BE INITIALIZED. 

FIRST, INITIALIZE THE CONTROL REGISTERS. IN THIS EXAMPLE, 
EVERYTHING IS INITIALIZED TO ZERO SINCE THE ACTUAL 
INITIALIZATION IS APPLICATION DEPENDENT. 



12-7 



Example 12-1. Processor Initialization Example (Concluded) 



LDI 6CTRL,AR0 

LDI @GLOINT,R0 

STI RO,*ARO 

LDI 0LOCALINT, RO 

STI R0,*+AR0(4) 

LDI 0DMAOCTL, RO 

STI R0,*+AR0(160) 

LDI 0DMA1CTL, RO 

STI RO, *+ARO (176) 

LDI @DMA2CTL, RO 

STI RO, *+ARO (192) 

LDI @DMA3CTL, RO 

STI RO, *+ARO (208) 

LDI 0DMA4CTL, RO 

STI R0,*+AR0(224) 

LDI @DMA5CTL, RO 

STI RO,*+ARO (240) 

LDI @CPOCTL,RO 

STI R0,*+AR0(64) 

LDI @CP1CTL,R0 

STI R0,*+AR0(80) 

LDI @CP2CTL / R0 

STI R0,*+AR0(96) 

LDI @CP3CTL,R0 

STI R0,*+AR0(112) 

LDI @CP4CTL,R0 

STI RO, *+ARO (128) 

LDI @CP5CTL,R0 

STI RO,*+ARO (144) 

LDI @TIMOCTL,RO 

STI R0,*+AR0(32) 

LDI @TIM1CTL,R0 

STI R0,*+AR0(48) 

LDI QANACTL, RO 

STI R0,*+AR0(16) 

LDi @STCK,SP 

OR 2000H,ST 

BR BEGIN 

.end 



LOAD in ARO the pointer to control 
registers 

Init global memory interface control 



Init 


local 


memory interface control 


Init 


DMA 0 


control 


Init 


DMA 1 


control 


Init 


DMA 2 


control 


Init 


DMA 3 


control 


Init 


DMA 4 


control 


Init 


DMA 5 


control 



Init communication port 0 control 

Init communication port 0 control 

Init communication port 0 control 

Init communication port 0 control 

Init communication port 0 control 

Init communication port 0 control 

Init timer 0 control 

Init timer 1 control 

Init analysis module control 

Initialize the stack pointer 
Global interrupt enable 

Branch to the beginning of 
application. 



12-8 



Software Applications 



Program Control — Subroutines 



12.2 Program Control 

TMS320C40 instructions provide program control and facilitates high- 
speed processing. These instructions directly handle: 

□ Regular and zero-overhead subroutine calls 

□ Software stack 

□ Interrupts 

□ Delayed branches 

□ Single- and multiple-instruction loops without any overhead 

12.2.1 Subroutines 

The 'C40 provides two ways to invoke the subroutine calls: regular and zero- 
overhead. The regular and zero-overhead subroutine calls use software 
stack (SP) and extended-precision register R 1 1 respectively to save the re- 
turn address. The following subsections use example programs to explain 
how this works. 

12.2. 1. 1 Regular Subroutine Calls 

The 'C40 has a 32-bit program counter (PC) and a practically unlimited soft- 
ware stack. The CALL and CALLcond subroutine calls cause the stack 
pointer to increment and store the contents of the next value of the PC 
counter on the stack. At the end of the subroutine, RETSconcf performs a 
conditional return. 

Example 1 2-2 illustrates the use of a subroutine to determine the dot prod- 
uct between two vectors. Given two vectors of length N, represented by the 
arrays a[0], a[1], a[N-1] and b[0], b[1],..., b[N-1], the dot product is com- 
puted from the expression 

d = a[0] b[0] + a[1] b[1] + ... + a[N-1] b[N-1] 

Processing proceeds in the main routine to the point where the dot product 
is to be computed. It is assumed that the arguments of the subroutine have 
been appropriately initialized. At this point, a CALL is made to the subrou- 
tine, transferring control to that section of the program memory for execu- 
tion, then returning to the calling routine via the RETS instruction when ex- 
ecution has completed. Note that for this particular example, it would suffice 
to save the register R2. However, a larger number of registers are saved for 
demonstration purposes. The saved registers are stored on the system 
stack, which should be large enough to accommodate the maximum antici- 
pated storage requirements. Other methods of saving registers could be 
used equally well. 
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Example 12-2. Regular Subroutine Call (Dot Product) 

* 

* TITLE REGULAR SUBROUTINE CALL (DOT PRODUCT) 

* MAIN ROUTINE THAT CALLS THE SUBROUTINE % DOT' TO COMPUTE THE 

* DOT PRODUCT OF TWO VECTORS . 



LDI @blkO,ARO ; ARO points to vector a 

LDI @blkl,ARl ; AR1 points to vector b 

LDI N,RC ; RC contains the number of elements 



CALL DOT 



* SUBROUTINE DOT 

^EQUATION: d = a(0) * b(0) + a(l) * b(l) + ... + a(N-l) * b(N-l) 

*THE DOT PRODUCT OF a AND b IS PLACED IN REGISTER RO . N MUST 
*BE GREATER THAN OR EQUAL TO 2. 

* 

* 
* 



★ 

DOT 



ARGUMENT ASSIGNMENTS: 
ARGUMENT | FUNCTION 
+ 

ARO | ADDRESS OF a(0) 

AR1 | ADDRESS OF b(0) 

RC | LENGTH OF VECTORS (N) 



REGISTERS USED AS INPUT: 
REGISTER MODIFIED: RO 
REGISTER CONTAINING RESULT 



ARO, AR1, RC 
RO 



.global 


DOT 


PUSH 


ST 


PUSH 


R2 


PUSHF 


R2 


PUSH 


ARO 


PUSH 


AR1 


PUSH 


RC 


PUSH 


RS 


PUSH 


RE 


MPYF3 


*AR0,*AR1,R0 ; 


SUBF 


R2.R2,R2 


SUBI 


2,RC 



Save status register 

Use the stack to save R2's 

bottom 32 and top 32 bits 

Save ARO 

Save AR1 

Save RC 



Initialize RO: 
a(0) * b(0) -> 
Initialize R2 . 
Set RC = N-2 



RO 
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* 


DOT PRODUCT (1 <= i 


< N) 






RPTS 


RC 




; Setup the repeat single. 




MPYF3 


*++AR0 (1) , 


*++ARl (1) ,R0 


; a(i) * b(i) -> RO 


1 1 

1 1 

* 


ADDF3 


RO, R2,R2 




• a / 1 *K/i-1 \ 4- R9 DO 
/ a. \ 1 1) D\X 1) ' ^ r\£ 


* 


ADDF3 


R0,R2,R0 






if 
•k 


RETURN 


SEQUENCE 








POP 


RE 








POP 


RS 








POP 


RC 


; Restore 


RC 




POP 


AR1 


; Restore 


AR1 




POP 


ARO 


; Restore 


ARO 




POPF 


R2 


; Restore 


top 32 bits of R2 




POP 


R2 








POP 


ST 


; Restore 


ST 


* 


RETS 




; Return 




* 


end 










. end 









72.2. 7.2 Zero-Overhead Subroutine Calls 

Two 'C40 instructions, link and jump (LAJ) and link and jump conditional 
(LAJcond), allow zero-overhead subroutine calls to be implemented on the 
'C40. Unlike the CALL and CALLconofwhich put the value of PG+1 into the 
software stack, the LAJ and LAJcond put the value of PC+4 into the exten- 
ded-precision register R1 1 . Three instructions following LAJ or LAJconc/will 
be executed before going to the subroutine. The restriction of these three 
instructions is the same as that of the three instructions following a delayed 
branch. At the end of the subroutine, a delayed branch conditional, BcondD, 
using the register addressing mode with R1 1 as source, can be used to per- 
form a zero-overhead subroutine return. 

For comparison, the same dot product example with zero-overhead subrou- 
tine call is given in the following example program. 
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Example 12-3. Zero-Overhead Subroutine Call (Dot Product) 



TITLE ZERO-OVERHEAD SUBROUTINE CALL (DOT PRODUCT) 



MAIN ROUTINE THAT CALLS THE SUBROUTINE *DOT' TO COMPUTE THE 
DOT PRODUCT OF TWO VECTORS. 



LAJ 
LDI 
LDI 
LDI 



DOT 

@blkO, ARO 
@blkl, AR1 
N,RC 



ARO points to vector a 
AR1 points to vector b 
RC contains the number of elements 



* SUBROUTINE 
^EQUATION: 



DOT 



d = a(0) * b(0) + a(l) * b(l) + 



+ a(N-l) * b(N-l) 



THE DOT PRODUCT OF a AND b IS PLACED IN REGISTER RO . N MUST 
BE GREATER THAN OR EQUAL TO 2. 

ARGUMENT ASSIGNMENTS: 
ARGUMENT | FUNCTION 



ARO | ADDRESS OF a(0) 

AR1 | ADDRESS OF b(0) 

RC | LENGTH OF VECTORS (N) 



* REGISTERS USED AS INPUT: ARO, AR1, RC 

* REGISTER MODIFIED: RO 

* REGISTER CONTAINING RESULT: RO 



.global 


DOT 






PUSH 


ST 


/ 


Save status register 


PUSH 


R2 


/ 


Use the stack to save R2's 


PUSHF 


R2 


r 


bottom 32 and top 32 bits 


PUSH 


ARO 


I 


Save ARO 


PUSH 


AR1 


t 


Save AR1 


PUSH 


RC 


/ 


Save RC 


PUSH 


RS 






PUSH 


RE 










/ 


Initialize RO: 


MPYF3 


*AR0,*AR1,R0 


/ 


a(0) * b(0) -> RO 


SUBF 


R2.R2,R2 


f 


Initialize R2. 


SUBI 


2,RC 


I 


Set RC = N-2 
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DOT PRODUCT (1 <= i < N) 



RPTS 


RC 




Setup the repeat single 


MPYF3 


*++AR0 (1) / 


*++ARl (1) ,R0 


; a(i) * b(i) -> RO 


ADDF3 


RO , R2 , R2 




a (i-1) *b (i-1) + R2 -> R2 


ADDF3 


RO, R2,R0 




a(N-l) *b(N-l) + R2 -> RO 


CFTTTPTtf 
£\Hi 1 U I\JN 








POP 


RE 






POP 


RS 






POP 


RC 




Restore RC 


POP 


AR1 




Restore AR1 


POP 


ARO 




Restore ARO 


BUD 


Rll 




Return 


POPF 


R2 




Restore top 32 bits of R2 


POP 


R2 




Restore bottom 32 bits of 


POP 


ST 




Restore ST 


end 








.end 









12.2.2 Software Stack 



Location of the 'C40 software stack is determined by the contents of the 
stack pointer register (SP). The stack pointer increments from low to high 
values, and provisions should be made to accommodate the anticipated 
storage requirements. The stack can be used not only during the subroutine 
CALL and RETS, but also inside the subroutine as a place of temporary stor- 
age of the registers as shown in Example 1 2-2. SP always points to the last 
value pushed onto the stack. 

The CALL and CALLcond instructions push the value of the program count- 
er onto the stack, as do the interrupt routines. Then, RETScond and 
RETIcond pop the stack and place the value in the program counter. The 
integer value of any register can be pushed onto and popped off the stack 
by using the PUSH and POP instructions. 

Two additional instructions, PUSHF and POPF, are for floating-point num- 
bers. These instructions can be used to pop and push floating-point num- 
bers to registers RO — R11 . This feature is very useful for saving the ex- 
tended precision registers (see Example 1 2-2 and Example 1 2-3). You can 
use PUSH and PUSHF on the same register to save the lower 32 and upper 
32 bits. PUSH saves the lower 32 bits; PUSHF, the upper 32 bits. To recover 
this extended-precision number, execute a POPF followed by POP. It is im- 
portant to do the integer and floating-point PUSH and POP in the above or- 
der. POPF forces the last eight bits of the extended-precision registers to 
zero. 
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The stack pointer (SP) can be read as well as written to. Multiple stacks for 
different program segments may be easily created. SP is not initialized by 
the hardware during reset; therefore, it is important to remember to initialize 
its value so that SP points to a predetermined memory location. This avoids 
the problem of SP attempting to write into ROM or write over other useful 
data. 

12.2.3 Interrupt Service Routines 

There are two types of interrupts on the 'C40: maskable and nonmaskable. 
The maskable interrupts include internal and external interrupts. All the in- 
terrupts are vectored and prioritized. The vector table for the various inter- 
rupts is located in relation to the interrupt-vector table pointer (IVTP, shown 
in Section 3.2 on page 3-1 5). The nonmaskable interrupt (NMI) has the high- 
est priority over other interrupts. Unlike other interrupts, the NMI cannot be 
masked by its own mask or by the GIE bit in the ST. It is temporarily masked 
during delayed branches and multicycle CPU operation. 

When an interrupt occurs, the corresponding flag is set in the interrupt flag 
register (IIF — explained in subsection 3.1.10, page 3-12). For nonmask- 
able interrupt, if the corresponding NMI flag is set, NMI begins the interrupt 
processing, as long as the CPU is not executing delayed branches or multi- 
cycle operation. For maskable interrupts, in order to respond to the interrupt 
when the corresponding interrupt flag is set, the GIE bit in the ST must be 
set to enable maskable interrupts globally, and the corresponding bit in the 
interrupt enable register (HE — described in subsection 3.1.9, page 3-10) 
or IIF register (for external interrupts) must be set also. Since pins 
IIOF(3 — 0) can be either general-purpose I/O or external interrupt pins, 
you must configure (using IIF register) those pins as interrupt pins to enable 
an external interrupt. Also, if the IIOF(3 — 0) pins are configured as interrupt 
pins, they can be configured (also at IIF register) as either edge-triggered 
or level-triggered interrupts. You can also write to the IIF register, making 
it possible to force an interrupt by software or to clear interrupts without pro- 
cessing them. 

The interrupt flag register can be read, and action can be taken, depending 
on whether the interrupt has occurred. This is true even when the maskable 
interrupt is disabled.This can be useful when an interrupt-driven interface 
is not implemented. Example 1 2-4 shows the case in which a subroutine 
is called when external interrupt 1 has not occurred. 
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Example 12-4. Use of Interrupts for Software Polling 

* TITLE INTERRUPT POLLING 



TSTB 40H,IIF ; Test if interrupt 1 has occurred 

CALLZ SUBROUTINE ; If not, call subroutine 



When interrupt processing begins, the program counter is pushed on the 
stack, and the interrupt vector is loaded in the program counter. Interrupts 
are then disabled by setting GIE=0, and the program continues from the ad- 
dress loaded in the program counter. Since all maskable interrupts are dis- 
abled, interrupt processing may proceed without further interruption unless 
the interrupt service routine re-enables interrupts, or the NMI occurs. 

Except for very simple interrupt service routines, it is important to assure 
that the processor context is saved during execution of this routine. The 
context must be saved before you execute the routine itself, and it must be 
restored after the routine is finished. The procedure is called context switch- 
ing. Context switching is also useful for subroutine calls, especially when 
extensive use is made of the auxiliary and the extended-precision registers. 
Code examples of context switching and an interrupt service routine are 
provided in this section. 

12.2.3.1 Context Switching 

Context switching is commonly required when processing a subroutine call 
or interrupt. It may be quite extensive or simple, depending on system re- 
quirements. For the 'C40, the program counter is automatically pushed onto 
the stack. Important information in other 'C40 registers, such as the status, 
auxiliary, or extended-precision registers must be saved by special com- 
mands. The status register should be saved first and restored last in order 
to preserve the processor status without any further change caused by other 
context-switching instructions. 
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Example 12-4 and Example 12-5 show saving and restoring of the 'C40 
state. In both examples, the stack is used for saving the registers, and it ex- 
pands towards higher addresses. If you don't want to use the stack pointed 
at by the SP, you can create a separate stack by using an auxiliary register 
as the stack pointer. Registers saved in these examples: 

□ Status register (ST) — should be saved first and restored last 

□ Extended-precision registers RO through R1 1 

□ Auxiliary registers ARO through AR7 

□ Data-page pointer (DP) 

□ Index registers (IRO and IR1 ) 

□ Block-size register (BK) 

□ Interrupt-related registers HE, IIF, and DIE 

□ Repeat-related registers RS, RE, and RC 
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Example 12-5. Context-Save for the TMS320C40 

* TITLE CONTEXT- SAVE FOR THE TMS320C40 
★ 

* .global SAVE 

* CONTEXT SAVE ON SUBROUTINE CALL OR INTERRUPT. 



PUSH 


ST 


Savp status rpcrister 






SAVE THE 


EXTENDED PRECISION 

XW X XjJ JLN LS XW XX X l\lmmi\*' mlm X> 


REGISTERS 






PUSH 


RO 


Save the lower 32 bits 


of 


RO 


PUSHF 


RO 


and the upper 32 bits 






PUSH 


Rl , 


Save the lower 32 bits 


of 


Rl 


PUSHF 


Rl 


and the upper 32 bits 






PUSH 


R2 , 


Save the lower 32 bits 


of 


R2 


PUSHF 


R2 


and the upper 32 bits 






PUSH 


R3 


Save the lower 32 bits 


of 


R3 


PUSHF 


R3 


and the upper 32 bits 






PUSH 


R4 


Save the lower 32 bits 


of 


R4 


PUSHF 


R4 


and the upper 32 bits 






PUSH 


R5 , 


Save the lower 32 bits 


of 


R5 


PUSHF 


R5 


• and the upper 32 bits 






PUSH 


R6 


Save the lower 32 bits 


of 


R6 


PUSHF 


R6 


• and the upper 32 bits 






PUSH 


R7 


• Save the lower 32 bits 


of 


R7 


PUSHF 


R7 


• and the upper 32 bits 






PUSH 


R8 


• Save the lower 32 bits 


of 


R8 


PUSHF 


R8 


• and the upper 32 bits 






PUSH 


R9 


• Save the lower 32 bits 


of 


R9 


PUSHF 


R9 


• and the upper 32 bits 






PUSH 


RIO 


• Save the lower 32 bits 


of 


RIO 


PUSHF 


RIO 


• and the upper 32 bits 






PUSH 


Rll 


• Save the lower 32 bits 


of 


Rll 


PUSHF 


Rll 


• and the upper 32 bits 






SAVE THE 


AUXILIARY REGISTERS 






PUSH 


ARO 


? Save ARO 






PUSH 


AR1 


; Save AR1 






PUSH 


AR2 


; Save AR2 






PUSH 


AR3 


; Save AR3 






PUSH 


AR4 


; Save AR4 






PUSH 


AR5 


; Save AR5 






PUSH 


AR6 


; Save AR6 






PUSH 


AR7 


; Save AR7 







SAVE THE REST REGISTERS FROM THE REGISTER FILE 



PUSH 
PUSH 
PUSH 
PUSH 
PUSH 
PUSH 



DP 

IRO 

IR1 

BK 

HE 

IIF 



Save data page pointer 

Save index register IRO 

Save index register IR1 

Save block-size register 

Save interrupt enable register 

Save interrupt flag register 
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PUSH 
PUSH 
PUSH 
PUSH 



DIE 
RS 
RE 
RC 



Save DMA interrupt enable register 
Save repeat start address 
Save repeat end address 
Save repeat counter 



SAVE IS COMPLETE 



Example 12-6. Context-Restore for the TMS320C40 



* 
* 

RESTR: 

* 
* 



TITLE CONTEXT-RESTORE FOR THE TMS320C40 
.global RESTR 

CONTEXT RESTORE AT THE END OF A SUBROUTINE CALL OR INTERRUPT. 
RESTORE THE REST REGISTERS FROM THE REGISTER FILE 



POP RC 
POP RE 
POP RS 
POP DIE 
POP I IF 
POP HE 
POP BK 
POP IR1 
POP IRO 
POP DP 



Restore 
Restore 
Restore 
Restore 
Restore 
Restore 
Restore 
Restore 
Restore 
Restore 



RESTORE THE AUXILIARY REGISTERS 



POP AR7 
POP AR6 
POP AR5 
POP AR4 
POP AR3 
POP AR2 
POPAR1 
POPARO 



Restore 
Restore 
Restore 
Restore 
Restore 
Restore 
Restore 
Restore 



repeat counter 

repeat end address 

repeat start address 

DMA interrupt enable register 

interrupt flag register 

interrupt enable register 

block-size register 

index register IR1 

index register IRO 

data page pointer 



AR7 
AR6 
AR5 
AR4 
AR3 
AR2 
AR1 
ARO 



* RESTORE THE EXTENDED PRECISION REGISTERS 



POPF 


Rll 


• Restore the upper 


32 


bits 


and 


POP 


Rll 


• the lower 32 bits 


of 


Rll 




POPF 


RIO 


• Restore the upper 


32 


bits 


and 


POP 


RIO 


• the lower 32 bits 


of 


RIO 




POPF 


R9 


• Restore the upper 


32 


bits 


and 


POP 


R9 


• the lower 32 bits 


of 


R9 




POPF 


R8 


• Restore the upper 


32 


bits 


and 


POP 


R8 


■ the lower 32 bits 


of 


R8 




POPF 


R7 


• Restore the upper 


32 


bits 


and 


POP 


R7 


• the lower 32 bits 


of 


R7 




POPF 


R6 


• Restore the upper 


32 


bits 


and 


POP 


R6 


• the lower 32 bits 


of 


R6 
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c\Jrc 




restore une upper o*. dius 


ana. 






une lower jz oils or kd 




DADIT 

Jtr UJrr 




* Restore the upper 32 bits 


and. 


rUr 


TD /I 


tine lower oils or 




rrvjJrr 


KJ , 


• Restore the upper 32 bits 


and 


PHD 




• the lower 32 bits of R3 




JrvJirr 




• Restore the upper 32 bits 


and 


brKJc 


DO 


• the lower 32 bits of R2 




P0PF 


Rl 


Restore the upper 32 bits and 


POP 


Rl 


• the lower 32 bits of Rl 




rUrr 


KU i 


• Restore the upper 32 bits 


and 


POP 


RO 


• the lower 32 bits of RO 




POP ST 




• Restore status register 




RESTORE 


IS COMPLETE 







12.2.3.2 Interrupt-Vector Table 

The interrupt-vector table (IVT, shown in Figure 3-8 on page 3-16) of the 
'C40 is relocatable. The location of the IVT is relative to the interrupt-vector 
table pointer (IVTP). The IVTP is a 32-bit expansion register that points to 
the base address of the IVT. Since the IVT is required to lie on a 51 2-word 
boundary, the 9 LSBs of the IVTP should always be zero. The two instruc- 
tions, LDEP and LDPE, read from and write to the expansion registers, IVTP 
and trap-vector table pointer (TVTP). Example 1 2-6 shows how to change 
the value of the IVTP (it is similar to changing the value of the TVTP). With 
this relocatable feature, an interrupt signal can be used for different ser- 
vices. In Example 1 2-7, the IVTP is reset in the external INTO interrupt ser- 
vice routines EINTOA and EINTOB. After the value of the IVTP is changed, 
CPU will go to a different interrupt service routine when the same interrupt 
signal occurs again. 
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Example 12-7. Use of One Interrupt Signal for Two Different Services 



* 

* 
* 
* 

* 

EINTOA: 



TITLE USE OF ONE INTERRUPT SIGNAL FOR TWO DIFFERENT SERVICES 

IN THIS EXAMPLE, THE ADDRESS OF EINTOA AND EINTOB ARE IN 
MEMORY LOCATION 03H AND 1003H RESPECTIVELY. ASSUMING THE IVTP 
HAS NOT BEEN CHANGED AFTER DEVICE RESET AND THE EXTERNAL 
INTERRUPT IIOFO IS ENABLED. WHEN THE FIRST IIOFO INTERRUPT 
SIGNAL COMES IN, THE EINTOA ROUTINE WILL BE EXECUTED. AND THEN 
IF THE NEXT IIOFO INTERRUPT SIGNAL OCCURS, THE EINTOB ROUTINE 
WILL BE EXECUTED, AND SO ON. THE EINTOA AND EINTOB ROUTINES 
WILL TAKE TURN TO BE EXECUTED WHEN IIOFO INTERRUPT SIGNAL 
OCCURS. 

External IIOFO interrupt service routine A 
global EINTOA 



LDI 
LDPE 



1000H,R0 
R0,IVTP 



Change IVTP to point to 1000H 



* 
* 

EINTOB: 



RETI ; Return and enable interrupts 

External IIOFO interrupt service routine A 
global EINTOB 



LDI 
LDPE 



0,R0 
R0,IVTP 



Change IVTP to point to 0 



RETI 



; Return and enable interrupts 



12.2.3.3 Interrupt Priority 

Interrupts on the 'C40 are automatically prioritized. This allows interrupts 
that occur simultaneously to be serviced in a predefined order. Infrequent, 
but lengthy, interrupt service routines may need to be interrupted by more 
frequently occurring interrupts. Since the GIE bit in ST is reset when the in- 
terrupt vector is taken, this nesting interrupt will occur only if it is the NMI in- 
terrupt or if the interrupt is re-enabled in the interrupt service routine. 
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In Example 1 2-8, the interrupt service routine for INT2 temporarily modifies 
the interrupt enable register (HE) and interrupt flag register (IIF) to permit 
interrupt processing when an interrupt to INTO or NMI (but no other interrupt) 
occurs. When the routine has finished processing, the HE register is re- 
stored to its original state. Notice that the RETlcond instruction not only 
pops the next program counter address from the stack, but also restores 
GIE and CF bits from the PGIE and PCF bits. This re-enables all interrupts 
that were enabled before the INT2 interrupt was serviced. 

Example 12-8. Interrupt Service Routine 

* TITLE INTERRUPT SERVICE ROUTINE 

* .global ISR2 



ISR2: 



ENABLE 


. set 


2000h 


MASK 


. set 


9h 




INTERRUPT 


PROCESSING 


FOR 


EXTERNAL INTERRUPT INT2- 


PUSH 


ST 




Save status register 


PUSH 


DP 




Save data page pointer 


PUSH 


HE 


r 


Save interrupt enable register 


PUSH 


IIF 






PUSH 


RO 




Save lower 32 bits and 


PUSHF 


RO 




upper 32 bits of RO 


PUSH 


Rl 




Save lower 32 bits and 


PUSHF 


Rl 




upper 32 bits of Rl 


LDI 


0, HE 




Unmask all internal interrupts 


LDI 


MASK, RO 






MHO 


RO, IIF 




Enable INT2 


OR 


ENABLE, ST 


r 


Enable all interrupts 


MAIN PROCESSING SECTION FOR ISR2 



XOR 


ENABLE, ST 


/ 


Disable all interrupts 


POPF 


Rl 


! 


Restore upper 32 bit sand 


POP 


Rl 


/ 


lower 32 bits of Rl 


POPF 


RO 


/ 


Restore upper 32 bits and 


POP 


RO 


/ 


lower 32 bits of RO 


POP 


IIF 






POP 


HE 


r 


Restore interrupt enable register 


POP 


DP 


/ 


Restore data page register 


POP 


ST 


/ 


Restore status register 


RET I 




/ 


Return and enable interrupts 
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12.2.4 Delayed Branches 

The 'C40 uses delayed branches to create single-cycle branching. The 
delayed branches operate like regular branches but do not flush the pipe- 
line. Instead, the three instructions following a delayed branch are also ex- 
ecuted. Similarly, besides delayed branches, 'C40 also uses link and jump 
(LAJ), link and trap (LAT), delayed repeat block (RPTBD), and delayed re- 
turn from interrupt or trap conditionally (RETlcond D) instructions to avoid 
the pipeline flush (as discussed in Section 6.3 on page 6-9) in the Program 
Flow Control chapter (Chapter 6), the only limitations are that none of the 
three instructions following a delayed branch can be a: 



□ 


Branch (standard or delayed) 


□ 


Branch and annul conditionally 


□ 


Call to a subroutine 


□ 


Link and jump instruction 


□ 


Link and trap instruction 


□ 


Return from a subroutine 


□ 


Return from an interrupt or trap (standard or delayed) 


□ 


Repeat instruction (standard or delayed) 


□ 


TRAP instruction 


□ 


IDLE instruction 



Conditional delayed branches use the conditions that exist at the end of the 
instruction immediately preceding the delayed branch. Sometimes, a 
branch is necessary in the flow of a program, but fewer than three 
instructions can be placed after a delayed branch. For faster execution, it 
is still advantageous to use a delayed branch. This is shown in 
Example 1 2-9, with a NOP taking the place of the third unused instruction. 
The tradeoff is more instruction words for less execution time. 
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Example 12-9. Delayed Branch Execution 

* TITLE DELAYED BRANCH EXECUTION 



LDF 

BGED 

LDFN 

SUBF 

NOP 

MPYF 



*+ARl (5) ,R2 
SKIP 
R2,R1 
3.0,R1 



1.5,R1 



Load contents of memory to R2 
If loaded number >=0, branch (delayed) 
If loaded number <0, load it to Rl 
Subtract 3 from Rl 

Dummy operation to complete delayed 
branch 

Continue here if loaded number <0 



SKIP 



LDF 



R1,R3 



Continue here if loaded number >«0 



12.2.5 Repeat Modes 

The 'C40 supports looping without any overhead. For that purpose, there 
are three instructions: RPTB and RPTBD repeat a block of code, and RPTS 
repeats a single instruction. The three control registers 

□ RS (Repeat Start address), 

□ RE (Repeat End address), and 

□ RC (Repeat Counter) 

contain the parameters that specify loop execution (refer to Section 6.1 on 
page 6-2 for a description of RPTB, RPTBD, and RPTS). Registers RS 
and RE are automatically set from the code, while RC must be set by the 
user, as shown in Example 12-10. 

Example 12-10. Use of Block Repeat to Find a Maximum or a Minimum 

* 

* TITLE USE OF BLOCK REPEAT TO FIND A MAXIMUM OR A MINIMUM 

* THIS ROUTINE FINDS THE MAXIMUM OR THE MINIMUM OF N=147 NUMBERS 



LDI 146, RC ; Initialize repeat counter to 147-1 

LDI @ADDR, ARO ; ARO points to the beginning 

; of the array 

LDF *AR0++(1) ,R0 / Initialize MAX or MIN to the 

; first value 

BLT LOOP2 ; If it is a negative array, find the 

; minimum 



12 
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* 



LOOPl 



RPTB 
CMPF 
LDFLT 
B 



MAX 

*AR0,R0 / Compare number to the maximum 

*AR0,R0 ; If greater, this is a new maximum 

NEXT 
MIN 

* ARO++ ( 1 ) , RO ; Compare number to the minimum 
*-ARO(l),RO ; If smaller, this is a new minimum 



MAX 



LOOP 2 



RPTB 
CMPF 
LDFLT 



MIN 
NEXT 



12.2.5.1 Block Repeat 



The '040 supports both standard and delayed repeat block instructions 
(RPTB and RPTBD). RPTB and RPTBD are the same except that the three 
instructions following RPTBD are not included in the loop (but are included 
in the RPTB loop). For RPTBD, the loop starts at the fourth instruction 
following RPTBD. The restriction of these three following instructions is the 
same as that of the three instructions following a delayed branch. Since 
RPTBD is a single-cycle instruction, it is very useful in making the nesting 
loop program more efficient. Example 12-10 shows the use of the block 
repeat to find the maximum or the minimum value of 147 numbers. The 
elements of the array are either all positive or all negative numbers. Since 
the loop cannot be predetermined, the RPTBD instruction is not suitable 
here. 



12.2.5.2 Specifies Restrictions in the Block-Repeat Construct 



Because the program counter is modified at the end of the loop according 
to the contents of registers RS, RE, and RC, no operation should attempt 
to modify the repeat counter or the program counter at the end of the loop 
to a different value. 

In principle, it is possible to nest repeat blocks. However, there is only one 
set of control registers: RS, RE, and RC. It is, therefore, necessary to save 
these registers before entering an inside loop and to restore these registers 
after completing the inside loop. It takes four cycles overhead to save and 
restore these registers. Hence, sometimes, it may be more economical to 
implement a nested loop by the more traditional method of using a register 
as a counter, and then using a delayed branch rather than by using the 
nested repeat block approach. 
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Example 1 2-1 1 shows an application of the delayed block repeat construct. 
In this example, an array of 64 elements is flipped over by exchanging the 
elements that are equidistant from the end of the array. In other words, if 
the original array is 

a(1),a(2) a(31),a(32) a(64); 

the final array after the rearrangement will be 

a(64), a(63) a(32),a(31) a(1). 

Because the exchange operation is done on two elements at the same time, 
it requires 32 operations. The repeat counter (RC) is initialized to 31 . In gen- 
eral, if RC contains the number N, the loop will be executed N+1 times. The 
loop is defined by the fourth instruction following the RPTBD instruction and 
the EXCH label. 



THIS CODE SEGMENT EXCHANGES THE VALUES OF ARRAY ELEMENTS THAT 
ARE SYMMETRIC AROUND THE MIDDLE OF THE ARRAY. 



Example 12-11. Loop Using Delayed Block Repeat 



* 



TITLE LOOP USING DELAYED BLOCK REPEAT 



LDI 



@ADDR, ARO 



ARO points to the beginning of the 
array 



RPTBD 



EXCH 



Repeat RC+1 times between START and 
EXCH 



START 
I I 

EXCH 
I I 



* 



LDI ARO , AR1 

ADD I 63,AR1 

LDI 31, RC 
»»»»»»»> 

LDI *AR0,R0 

LDI *AR1,R1 

STI R1,*AR0++(1) 

STI R0,*AR1— (1) 



AR1 points to the end of the array 

Initialize repeat counter 

Loop starts here 

Load one memory element in RO, 

and the other in Rl 

Then, exchange their locations 
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12.2.5.3 Single-Instruction Repeat 

The single-instruction repeat uses control registers RS, RE, and RC in the 
same way as does the block repeat. The advantage over the block repeat 
is that the instruction is fetched only once, and then the buses are available 
for moving operands. One difference to note is that the single-instruction re- 
peat construct is not interruptible, while block repeat is interruptible. 

Example 1 2-1 2 shows an application of the repeat-single construct. In this 
example, the sum of the products of two arrays is computed. The arrays are 
not necessarily different. If the arrays are a(i) and b(i), and if each is of length 
N=512, register RO will contain, after computation, this quantity: 

a(1 ) b(1 ) + a(2) b(2) +...+ a(N) b(N). 
The value of the repeat counter (RC) is specified to be 51 1 in the instruction. 

Example 12-12. Loop Using Single Repeat 

* TITLE LOOP USING SINGLE REPEAT 





LDI 
LDI 


@ ADDR1 , ARO 
(§ ADDR2 , AR1 


; ARO points to array a(i) 
; AR1 points to array b(i) 


* 


LDF 


0.0, RO 


; Initialize RO 


* 


MPYF3 


*AR0++(1) , *AR1++(1) ,R1 


; Compute first product 


* 


RPTS 


511 


; Repeat 512 times 


1 1 


MPYF3 
ADDF 3 


*AR0++(1) , *AR1++<1) ,Rl 
Rl, RO, RO 


; Compute next product and 
; accumulate the 
; previous one 




ADDF 


R1,R0 


; One final addition 
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12.2.6 Computed GOTOs to Select Subroutines at Runtime 

Occasionally, it is convenient to select during runtime, and not during as- 
sembly, the subroutine to be executed. The 'C40's computed GOTO sup- 
ports this selection. The computed GOTO is implemented by using the 
CALLcond instruction in the register addressing mode. This instruction 
uses the contents of the register as the address of the call. Example 1 2-1 3 
shows the case of a task controller. 



Example 12-13. Computed GOTO 



TITLE COMPUTED GOTO 
TASK CONTROLLER 

THIS MAIN ROUTINE CONTROLS THE ORDER OF TASK EXECUTION (6 TASKS 
IN THE PRESENT EXAMPLE) . TASKO THROUGH TASK5 ARE THE NAMES OF 
SUBROUTINES TO BE CALLED. THEY ARE EXECUTED IN ORDER, TASKO, 
TASK1, . . . TASK5 . WHEN AN INTERRUPT OCCURS, THE INTERRUPT 
SERVICE ROUTINE IS EXECUTED, AND THE PROCESSOR CONTINUES 
WITH THE INSTRUCTION FOLLOWING THE IDLE INSTRUCTION. THIS 
ROUTINE SELECTS THE TASK APPROPRIATE FOR THE CURRENT CYCLE, 
CALLS THE TASK AS A SUBROUTINE, AND BRANCHES BACK TO THE IDLE 
TO WAIT FOR THE NEXT SAMPLE INTERRUPT WHEN THE SCHEDULED TASK 
HAS COMPLETED EXECUTION. RO HOLDS THE OFFSET FROM THE BASE 
ADDRESS OF THE TASK TO BE EXECUTED. BIT 15 (SET COND BIT) OF 
STATUS REGISTER (ST) SHOULD BE SET TO 1. 



WAIT 



TSKSEQ 



ADDR 



LDI 


5,IR0 


• Initialize IRO 


LDI 


@ ADDR, AR1 


• AR1 holds the base address 






» of the table 


IDLE 




• Wait for the next interrupt 


ADD I 


*+ARl (IRO) ,R1 


• Add the base address to the table 






• entry number 


SUBI 


1,IR0 \ 


• Decrement IRO 


LDILT 


5, IRO 


• If IR0<0, reinitialize it to 5 


CALLU 


Rl 


• Execute appropriate task 


BR WAIT 






.word 


TASK5 


• Address of TASK5 


.word 


TASK4 


• Address of TASK4 


.word 


TASK3 


• Address of TASK3 


.word 


TASK2 


• Address of TASK2 


.word 


TASK1 


• Address of TASK1 


.word 


TASKO 


? Address of TASKO 


.word 


TSKSEQ 
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12.3 Logical and Arithmetic Operations 

The 'C40 instruction set supports both integer and floating-point arithmetic 
and logical operations. The basic functions of such instructions can be com- 
bined to form more complex operations. This section examines examples 
of these operations: 

□ Bit manipulation 

□ Block moves 

□ Byte and half-word manipulation 

□ Bit-reversed addressing 

□ Integer and floating-point division 

□ Square root 

□ Extended-precision arithmetic 

□ Floating-point format conversion between IEEE and 'C40 formats 

12.3.1 Bit Manipulation 

Instructions for logical operations, such as AND, OR, NOT, ANDN, and 
XOR, can be used together with the shift instructions for bit manipulation. 
A special instruction, TSTB, tests bits. TSTB does the same operation as 
AND, but the result of the TSTB is used only to set the condition flags and 
is not written anywhere. Example 12-1 4 and Example 12-1 5 demonstrate 
the use of the several instructions for bit manipulation and testing. 

Example 12-14. Use of TSTB for Software-Controlled Interrupt 

* TITLE USE OF TSTB FOR SOFTWARE-CONTROLLED INTERRUPT 
* 

* IN THIS EXAMPLE, ALL INTERRUPTS HAVE BEEN DISABLED BY 

* RESETTING THE GIE BIT OF THE STATUS REGISTER. WHEN AN 

* INTERRUPT ARRIVES, IT IS STORED IN THE IF REGISTER. THE 

* PRESENT EXAMPLE ACTIVATES THE INTERRUPT SERVICE ROUTINE INTR 

* WHEN IT DETECTS THAT INT2- HAS OCCURRED. 



TSTB 4,IIF ; Check if bit 2 of IF is set, 

CALLNZ INTR ; and, if so, call subroutine INTR 
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Example 12-15. Copy a Bit from One Location to Another 

* TITLE COPY A BIT FROM ONE LOCATION TO ANOTHER 

* BIT I OF Rl NEEDS TO BE COPIED TO BIT J OF R2. 

* ARO POINTS TO A LOCATION HOLDING I, AND IT IS ASSUMED THAT THE 

* NEXT MEMORY LOCATION HOLDS THE VALUE J. 



LDI 


1,R0 




LSH 


* ARO , RO 


; Shift 1 to align it with bit I 


TSTB 


R1,R0 


; Test the I-th bit of Rl 


BZD 


CONT 


; If bit = 0, branch delayed 


LDI 


1,R0 




LSH 


*+AR0 (1) ,R0 


; Align 1 with J-th location 


ANDN 


RO, R2 


; If bit = 0, reset J-th bit of R2 


OR 


R0,R2 


; If bit = 1, set J-th bit of R2 


CONT 







12.3.2 Block Moves 

Because the 'C40 directly addresses a large amount of memory, blocks of 
data or program code can be stored off-chip in slow memories and then 
loaded on-chip for faster execution. Data can also be moved from on-chip 
to off-chip for storage or for multiprocessor data transfers. 

Such data transfers can be accomplished efficiently in parallel with CPU 
operations using the DMA. The DMA operation is explained in detail in 
Chapter 9. An alternative to DMA is to perform data transfers under program 
control by using load and store instructions in a repeat mode. 
Example 1 2-1 6 shows the transfer of a block of 51 2 floating-point numbers 
from external memory to block 1 of the on-chip RAM. 
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Example 12-16. Block Move Under Program Control 

* TITLE BLOCK MOVE UNDER PROGRAM CONTROL 



extern 
blockl 



I I 



word 01000H 
word 02FFC00H 



LDI 
LDI 



©extern , ARO 
@blockl,ARl 



LDF *AR0++,R0 

RPTS 510 

LDF *AR0++,R0 

STF R0,*AR1++ 



Source address 
Destination address 

Load the first number 

Repeat following instruction 
511 times 

Load the next number , and. . . 
store the previous one 



STF R0,*AR1 



Store the last number 



12.3.3 Byte and Half- Word Manipulation 

A new set of instructions for byte and half-word accessibility, such as 
LB(3,2,1 ,0), LBU(3,2,1 ,0), LH(1 ,0), LHU(1 ,0), LWL(0,1 ,2,3), LWR(0,1 ,2,3), 
MB(3,2,1 ,0), and MH(1 ,0), are available on the 'C40. In application such as 
image processing, it is often important to be able to manipulate packed data. 
For example, the pixels in color images are often represented by four 8-bit 
unsigned quantities — red, green, blue and alpha — which are packed 
into a single 32-bit word. The byte and half-word instruction will make it very 
easy to manipulate this packed data. 

Example 12-17 shows the case of packing data from a half-word FIFO to 
32-bit data memory, and Example 12-18 shows the case of unpacking a 
32-bit data array into a four-byte-wide data array (assuming the 32-bit data 
array contains four 8-bit unsigned numbers). 

Example 12-17. Use of Packing Data From Half-Word FIFO to 32Sit Data Memory 

* TITLE USE OF PACKING DATA FROM HALF-WORD FIFO 

* TO 3 2 -BIT DATA MEMORY 
* 

* IN THIS EXAMPLE, EVERY TWO INPUT 16 BITS DATA 

* HAS BEEN PACKED INTO ONE 32-BIT DATA MEMORY. THE LOOP SIZE 

* USED HERE IS ARRAY SIZE, NOT THE INPUT DATA LENGTH. 
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PACK 



RPTBD PACK 

LDI @fifo_adr,ARl 

LDI ©array, AR2 

LDI size-l,RC 

»»»»»»»» 

LWLO *AR1,R9 

LWL1 *AR1, R9 

STI R9,*AR2++(1) 



Load fifo address 

Load data array address 

Load array size 

Loop starts here 

Pack 16 LSBs 

Pack 16 MSBs 

Store the data 



Example 12-18. Use of Unpacking 32-Bit Data Into Four-Byte-Wide Data Array 

* TITLE USE OF UNPACKING 32-BIT DATA INTO FOUR BYTE-WIDE 

* DATA ARRAY 
★ 

* THIS EXAMPLE ASSUMED THAT THE 32-BIT DATA CONTAINS FOUR 8-BIT 

* UNSIGNED DATA. 



LDI @ input_adr , ARO ; Load RPTBD UNPACKinput address 

LDI @arrayl,ARl ; Load output data array 1 address 

LDI @array2,AR2 ; Load output data array 2 address 

RPTBD UNPACK 

LDI @array3,AR3 / Load output data array 3 address 

LDI @array4,AR4 ; Load output data array 4 address 

LDI size-l,RC ; Load array size 

* »»»»»»»» ; Loop starts here 

LBUO *AR0 / R8 ; Unpack first byte 

STI R8,*AR1++(1) 

LBU1 *AR0,R8 ; Unpack second byte 

STI R8,*AR2++<1) 

LBU2 *AR0,R8 / Unpack third byte 

STI, R8,*AR3++<1) 

LBU3 *AR0++(1),R8 ; Unpack fourth byte 

UNPACK STI R8,*AR4++(1) 



12.3.4 Bit-Reversed Addressing 

The 'C40 can implement fast Fourier transforms (FFT) with bit-reversed ad- 
dressing. If the data to be transformed is in the correct order, the final result 
of the FFT is scrambled in bit-reversed order. To recover the frequency-do- 
main data in the correct order, certain memory locations must be swapped. 
The bit-reversed addressing mode makes swapping unnecessary. The next 
time data needs to be accessed, the access is done in a bit-reversed man- 
ner rather than sequentially. In 'C40, this bit-reversed addressing can be im- 
plemented through both the CPU and DMA. 

In CPU bit-reversed addressing, IRO holds a value equal to one-half the size 
of the FFT, if real and imaginary data are stored in separate arrays. During 
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accessing, the auxiliary register is indexed by IRO, but with reverse carry 
propagation. Example 12-19 illustrates a 512-point complex FFT being 
moved from the place of computation (pointed at by ARO) to a location 
pointed at by AR1 . In this example, real and imaginary parts XR(i) and Xl(i) 
of the data are not stored in separate arrays, but they are interleaved with 

XR(0), Xl(0), XR(1 ), Xl(1 ) XR(N1 ), XI(N1 ). Because of this arrangement, 

the length of the array is 2N instead of N, and IRO is set to 512 instead of 
256. 

Example 12-19. Bit-Reversed Addressing 

* 

* TITLE BIT-REVERSED ADDRESSING 

* THIS EXAMPLE MOVES THE RESULT OF THE 512-POINT FFT 

* COMPUTATION, POINTED AT BY ARO, TO A LOCATION POINTED AT 

* BY AR1. REAL AND IMAGINARY POINTS ARE ALTERNATING. 





LDI 


512, IRO 






RPTBD 


LOOP 






LDI 


2, IR1 






LDI 


511, RC 


• Repeat 511+1 times 


* 


LDF 


*+AR0(l),Rl 


■ Load first imaginary point 




LDF 


*AR0++ (IR0)B,R0 t 


• Load real value (and point 


1 1 


STF 


R1,*+AR1(1) 


• to next location) and store 








• the imaginary value 


LOOP 


LDF 


*+AR0(l),Rl 


• Load next imaginary point 








• and store 


1 1 


STF 


R0,*AR1++(IR1) 


; previous real value 



In DMA bit-reversed addressing, there are two bits in the DMA control regis- 
ter to enable bit-reversed addressing on DMA reads and DMA write. The 
source address index register and destination address index register are 
used to define the size of the bit-reversed addressing. Their function is simi- 
lar to the CPU index register IRO. For more detail information about DMA 
operation, refer to Chapter 9. 
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12.3.5 Integer and Floating-Point Division 

'C40 has a single-cycle instruction, RCPF, to generate an estimate of the 
reciprocal of a floating-point number. This estimate has the correct expo- 
nent, and the mantissa is accurate to the eighth binary place (the error of 
the mantissa is < 2-8). Often, this is a satisfactory estimate of the reciprocal 
of a floating-point number. In other cases, this estimate may be used as a 
seed for an algorithm that computes the reciprocal to even greater accuracy. 
The Newton-Raphson algorithm described later is one such case. 

For integer division, although the special instruction is not provided, the in- 
struction set has the capacity to perform an efficient division routine. Be- 
sides, the rough estimate can be achieved through FLOAT, RCPF, and FIX 
instructions. 

12.3.5.1 Integer Division 

Division is implemented on the 'C40 by repeated subtractions using SUBC, 
a special conditional subtract instruction. Consider the case of a 32-bit posi- 
tive dividend with i significant bits (and 32-i sign bits), and a 32-bit positive 
divisor with j significant bits (and 32— j sign bits). The repetition of the SUBC 
command i-j+1 times produces a 32-bit result where the lower i-j+1 bits are 
the quotient, and the upper 31-i+j bits are the remainder of the division. 

SUBC implements binary division in the same manner as long division. The 
divisor (assumed to be smaller than the dividend) is shifted left i-j times to 
be aligned with the dividend. Then, using SUBC, the shifted divisor is sub- 
tracted from the dividend. For each subtract that does not produce a nega- 
tive answer, the dividend is replaced by the difference. It is then shifted to 
the left, and a one is put in the LSB. If the difference is negative, the dividend 
is simply shifted left by one. This operation is repeated i-j+1 times. 

As an example, consider the division of 33 by 5 using both long division and 
the SUBC method. In this case, i=6, j=3, and the SUBC operation is re- 
peated 6-3+1 =4 times. 
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LONG DIVISION: 



00000000000000000000000000000101 



00000000000000000000000000000110 



00000000000000000000000000100001 

-101 
1101 
-101 



SUBC METHOD: 



11 



Quotient 



Remainder 



000000000000000000000000001 00001 
000000000000000000000000001 01 000 

Negative difference 
i 

00000000000000000000000000100001 
00000000000000000000000000101000 

00000000000000000000000000011 01 0 



000000000000000000000000001 00001 
000000000000000000000000001 01 000 

00000000000000000000000000011 01 0 

I 

00000000000000000000000000011011 
00000000000000000000000000101000 

Negative difference 

4 

0000000000000000000000000011 011 0 



4 

Remainder 



I 

Quot. 



Dividend 
Divisor (aligned) 
(1st SUBC command) 



New Dividend + Quotient 
Divisor 

Difference (>0) (2nd SUBC command) 



New Dividend + Quotient 
Divisor 

Difference (>0) (3rd SUBC command) 



New Dividend + Quotient 
Divisor 

(4th SUBC command) 



Final Result 



When the SUBC command is used, both the dividend and the divisor must 
be positive. Example 12-20 shows a realization of the integer division in 
which the sign of the quotient is properly handled. The last instruction before 
returning modifies the condition flag in case subsequent operations depend 
on the sign of the result. 



12-34 



Software Applications 



Logical and Arithmetic Operations 



Example 12-20. Integer Division 



DIVI: 



TITLE INTEGER DIVISION 
SUBROUTINE DIVI 

INPUTS: SIGNED INTEGER DIVIDEND IN RO, 
SIGNED INTEGER DIVISOR IN Rl . 

OUTPUT: R0/R1 into RO . 

REGISTERS USED: R0-R3, IRO, IR1 

OPERATION: 1. NORMALIZE DIVISOR WITH DIVIDEND 

2. REPEAT SUBC 

3. QUOTIENT IS IN LSBs OF RESULT 

CYCLES: 31-62 (DEPENDS ON AMOUNT OF NORMALIZATION) 

.globl DIVI 

SIGN .setR2 

TEMPF . set R3 

TEMP . set IRO 

COUNT . set IR1 

DIVI - SIGNED DIVISION 



DETERMINE SIGN OF RESULT. GET ABSOLUTE VALUE OF OPERANDS. 



XOR 
ABSI 
ABSI 
CMP I 
BGTD 



R0,R1,SIGN 

RO 

Rl 

RO, Rl 
ZERO 



Get the sign 



Divisor > dividend ? 
If so, return 0 



NORMALIZE OPERANDS. USE DIFFERENCE IN EXPONENTS AS SHIFT COUNT 
FOR DIVISOR, AND AS REPEAT COUNT FOR 'SUBC' . 



FLOAT 


RO, TEMPF 


Normalize dividend 


PUSHF 


TEMPF , 


• USH as float 


POP 


COUNT 


• POP as int 


LSH 


-24, COUNT 


• Get dividend exponent 


FLOAT 


Rl, TEMPF 


• Normalize divisor 


PUSHF 


TEMPF , 


• PUSH as float 


POP 


TEMP 


• POP as int 


LSH 


-24, TEMP 


• Get divisor exponent 


SUBI 


TEMP, COUNT 


• Get difference in exponents 


LSH 


COUNT, Rl 


• Align divisor with dividend 
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* DO COUNT+1 SUBTRACT & SHIFTS. 

RPTS COUNT 
SUBC R1,R0 

* 

* MASK OFF THE LOWER COUNT+1 BITS OF RO 

SUBRI 31, COUNT ; Shift count is (32 - (COUNT+1)) 

LSH COUNT, RO / Shift left 

NEGI COUNT 

LSH COUNT, RO ; Shift right to get result 

* CHECK SIGN AND NEGATE RESULT IF NECESSARY. 
★ 

NEGI R0,R1 / Negate result 

ASH -31, SIGN / Check sign 

LDINZ R1,R0 ; If set, use negative result 

CMPI 0,R0 / Set status from result RETS 

* 

* RETURN ZERO. 
ZERO : 

LDI 0,R0 

RETS 

.end 

If the dividend is less than the divisor and you want fractional division, you 
can perform a division after you determine the desired accuracy of the quo- 
tient in bits. If the desired accuracy is k bits, start by shifting the dividend 
left by k positions. Then apply the algorithm described above, where i should 
now be replaced by i + k. It is assumed that i + k is less than 32. 



12.3.5.2 Computation of Floating-Point Inverse and Division 

This section presents a method of implementing a single-cycle RCPF in- 
struction (reciprocal of a floating-point number) with an algorithm to extend 
the precision of the mantissa of the reciprocal of a floating-point number 
generated by RCPF instruction. The floating-point division can be obtained 
by multiplying the dividend and the reciprocal of the divisor. 

The input to RCPF is assumed to be v - v(man) x 2 v ( ex P). The output is 
x = x(man) x 2 x (exp). The value v(man) (or x(man)) is composed of three 
fields: the sign bit v(sign), an implied nonsign bit, and the fraction field 
v(frac). 

The algorithm for RCPF uses these four rules: 

1) If v > 0, then x(exp) = -v(exp) - 1 and x(man) = 2/v(man). 
For the special case where the ten MSBs of v(man) = 
01.00000000b, then x(man)= 2-2-8 = 01.11111111b. In both 
cases, the 23 LSBs of x(frac) = 0. 
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2) If v < 0, then x(exp) = -v(exp) - 1 and x(man) = 2/v(man). 

For the special case of the ten MSBs of v(man) = 1 0.00000000b, then 
x(man) =-1 -2-8 . 1 0. 1 1 1 1 1 1 1 1 b. In both cases, the 23 LSBs of x(frac) 
= 0. 

3) If v = 0 ( v(exp) m -1 28 ), then x(exp) = 1 27 and 
x(man)=: 01.1111111111111111111111111111111b. 

In other words, if v = 0, then x becomes the largest positive number 
representable in the extended-precision floating-point format. The 
overflow flag (V) is set to 1 . 

4) If v(exp) = 127, then x(exp) = -128 and x(man) = 0. 
The zero flag (Z) is set to 1 . 

The RCPF instruction gives an estimate of the reciprocal of a number. The 
Newton-Raphson algorithm may be used to further extend the precision of 
the mantissa. The algorithm is 

x[n+1] = x[n](2.0-vx[n]) 

v is the number for which the reciprocal is desired. x[0] is the seed for the 
algorithm and is given by RCPF. At every iteration of the algorithm, the num- 
ber of bits of accuracy in the mantissa doubles. Using RCPF, accuracy starts 
at eight bits. With one iteration, accuracy increases to1 6 bits, and with the 
second iteration, accuracy increases to 32 bits in the mantissa. 
Example 12-21 shows the program to implement this algorithm on the 'C40. 

Example 12-21. Inverse of a Floating-Point Number With 32-Bit Mantissa Accuracy 

TITLE INVERSE OF A FLOATING-POINT NUMBER 
WITH 32-BIT MANTISSA ACCURACY 

SUBROUTINE INVF 

THE FLOATING-POINT NUMBER v IS STORED IN R0 . AFTER THE 
COMPUTATION IS COMPLETED, 1/v IS STORED IN Rl . 

TYPICAL CALLING SEQUENCE: 
LAJU INVF 
LDF v, R0 

NOP < can be other non-pipeline-break 

NOP < instructions 

ARGUMENT ASSIGNMENTS: 

ARGUMENT | FUNCTION 



R0 I v - NUMBER TO FIND THE RECIPROCAL OF 

I (UPON THE CALL) 

Rl | 1/v (UPON THE RETURN) 



12 
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INVF: 



REGISTER USED AS INPUT: RO 

REGISTERS MODIFIED: Rl, R2 

REGISTER CONTAINING RESULT: Rl 

REGISTER USED FOR SUBROUTINE CALL: Rll 

CYCLES: 8 WORDS: 8 



.global 

RCPF 

MPYF3 
SUBRF 
MPYF 

BUD 

MPYF3 
SUBRF 
MPYF 



.end 



INVF 

R0,R1 

R1,R0,R2 

2.0,R2 

R2,R1 

Rll 

R1,R0,R2 

2.0,R2 

R2,R1 



Get x[0] = the estimate of 1/v, RO = v 



End of first iteration 
(16 bits accuracy) 

Delayed return to caller 



End of second iteration 
(32 bits accuracy) 



Rl = 1/v, Return to caller 



12.3.6 Square Root 

In many applications, normalization of data values is necessary. Often, the 
normalizing factor is the square root of another quantity. For example, given 
a vector, the unit vector in the same direction as the original vector can be 
found by normalizing the original vector by the length of the vector. This 
involves a division by a square root. The '040 provides a single-cycle 
instruction, RSQRF, to generate an estimate of the reciprocal of the square 
root of a positive floating-point number. This estimate has the correct 
exponent, and the mantissa is accurate to the eighth binary place (the error 
of the mantissa is < 2~ 8 ). Like the algorithm for RCPF, the algorithm for 
RSQRF uses these three rules: 

1) If v(exp) is even, then x(exp) = -(v(exp)/2) - 1 and 

x(man) = 2/sqrt(v(man)). For the special case where the ten MSBs of 
y(man) = 01 .00000000b, then x(man) = 2 - 2 ~ 8 = 01 . 1 1 1 1 1 1 1 1 b. 
In both cases, the 23 LSBs of x(frac) = 0. 
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2) If v(exp) is odd, then x(exp) = -((v(exp) - 1 )/2) - 1 and 
x(man) = sqrt(2/v(man)). The 23 LSBs of x(frac) = 0. 

3) If v - 0 ( v(exp) = -1 28 ), then x(exp) = 1 27 and 
x(man) = 01.1111111111111111111111111111111b. 

In other words, if v = 0, then x becomes the largest positive number rep- 
resentable in the extended-precision floating-point format. The over- 
flow flag (V) is set to 1 . 

Once the RSQRF instruction gives the estimate of the reciprocal of the 
square root, you can use the Newton-Raphson algorithm to further extend 
the precision of the mantissa. The algorithm is 

x[n+1] = x[n](1 .5-(v/2)x[n]x[n]) 

v is the number for which the reciprocal is desired. x[0] is the seed for the 
algorithm and is given by RSQRF. At every iteration of the algorithm, the 
number of bits of accuracy in the mantissa doubles. Using RSQRF, accura- 
cy starts at eight bits. With one iteration, accuracy increases to1 6 bits, and 
with the second iteration, accuracy increases to 32 bits in the mantissa. 
Example 1 2-22 shows the program to implement this algorithm on the '040. 



Example 12-22. Reciprocal of the Square Root of a Positive Floating-Point 



* 

* TITLE RECIPROCAL OF THE SQUARE ROOT OF A POSITIVE 

* FLOATING-POINT 

* SUBROUTINE RCPSQRF 

* THE FLOATING-POINT NUMBER v IS STORED IN RO . AFTER THE 

* COMPUTATION IS COMPLETED, l/SQRT(v) IS STORED IN Rl . 

* TYPICAL CALLING SEQUENCE: 

* LDFv, RO 

* LAJU RCPSQRF 

* ARGUMENT ASSIGNMENTS: 
* 

* ARGUMENT | FUNCTION 

* + 

* RO I v = NUMBER TO FIND THE RECIPROCAL OF 

| (UPON THE CALL) 

* Rl | l/sqrt(v) (UPON THE RETURN) 

* REGISTER USED AS INPUT: RO 

* REGISTERS MODIFIED: Rl, R2 

* REGISTER CONTAINING RESULT: Rl 

* REGISTER USED FOR SUBROUTINE CALL: Rll 
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* CYCLES: 11 

.global 

RCPSQRF: RSQRF 

MPYF 

MPYF3 
MPYF 
SUBRF 
MPYF 

MPYF 3 

* 

BRD 



WORDS: 11 
RCPSQRF 
R0,R1 

0.5,R0 ; 

R1,R1,R2 
R0,R2 
1.5, R2 

R2,R1 ; 

R1,R1,R2 ; 
Rll ; 



MPYF R0,R2 
SUBRF 1.5,R2 
MPYF R2,R1 



Get x[0] = the estimate of 
1/sqrt (v) , RO - v 
RO - v/2 

First iteration 



End of first iteration 
(16 bits accuracy) 

Second iteration 

Delayed return to caller 



End of second iteration 
(32 bits accuracy) 



Rl = 1/SQRT (v) , Return to caller 
. end 



Of course, the square root is found by a simple multiplication: sqrt(v) = vx[n] 
where x[n] is the estimate of 1/sqrt(v) as determined by the Newton-Raph- 
son algorithm or some other algorithms. 
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12.3.7 Extended-Precision Arithmetic 



The TMS320C40 offers 32 bits of precision for integer arithmetic, and 24 bits 
of precision in the mantissa for floating-point arithmetic. For higher precision 
in floating-point operations, the twelve extended-precision registers RO to 
R1 1 contain eight more bits of accuracy. Since no comparable extension is 
available for fixed-point arithmetic, this section discusses how fixed-point 
double precision can be achieved by using the capabilities of the processor. 
The technique consists of performing the arithmetic by parts and is similar 
to the way in which longhand arithmetic is done. 

In the instruction set, operations ADDC (add with carry) and SUBB (subtract 
with borrow) use the status carry bit for extended-precision arithmetic. The 
carry bit is affected by the arithmetic operations of the ALU and by the rotate 
and shift instructions. It can also be manipulated directly by setting the sta- 
tus register to certain values. For proper operation, the overflow mode bit 
should be reset (OVM = 0) so that the accumulator results will not be loaded 
with the saturation values. Example 1 2-23 and Example 1 2-24 show 64-bit 
addition and 64-bit subtraction. The first operand is stored in the registers 
RO (low word) and R1 (high word). The second operand is stored in R2 and 
R3, respectively. The result is stored in RO and R1 . 



TWO 64-BIT NUMBERS ARE ADDED TO EACH OTHER PRODUCING 



A 64-BIT RESULT. THE NUMBERS X (R1,R0) AND Y (R3,R2) 



Example 12-23. 64-Bit Addition 



* 



TITLE 64-BIT ADDITION 



ADDED, RESULTING IN W (R1,R0). 



Rl 
+ R3 



RO 
R2 



Rl 



RO 



ADD I 
ADDC 



R2,R0 
R3,R1 
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Example 12-24. 64-Bit Subtraction 

* 

* TITLE 64-BIT SUBTRACTION 

* TWO 64-BIT NUMBERS ARE SUBTRACTED FROM EACH OTHER 

* PRODUCING A 64-BIT RESULT. THE NUMBERS X (R1,R0) AND 

* Y (R3,R2) ARE SUBTRACTED, RESULTING IN W (R1,R0). 
* 

* Rl RO 

* - R3 R2 

* 

* Rl RO 
* 

SUBI R2,R0 
SUBB R3,R1 

When two 32-bit numbers are multiplied, a 64-bit product results. To do this, 
'C40 provides a 32 x 32-bit multiplier and two special instructions, MPYSHI 
(multiply signed integer and produce 32 MSBs) and MPYUHI (multiply un- 
signed integer and produce 32 MSBs). Example 12-25 shows the imple- 
mentation of a 32-bit by 32-bit multiplication. 

Example 12-25. 32-Bitby 32SiWultiplication 

* 

* TITLE 32 x 32-BIT MULTIPLICATION 

* TWO 32-BIT NUMBERS ARE MULTIPLIED, PRODUCING A 64-BIT RESULT. 

* THE TWO NUMBERS X (RO) AND Y (Rl) ARE MULTIPLIED, RESULTING 

* IN W (R3,R2) . 
* 

* RO 

* X Rl 

* R3 R2 

MPYI3 R0,R1,R2 
MPYSHI3 R0,R1,R3 



12.3.8 Floating-Point Format Conversion: IEEE to/from TMS320C40 

In fixed-point arithmetic, the binary point that separates the integer from the 
fractional part of the number is fixed at a certain location. For example, if a 
32-bit number has the binary point after the most significant bit (which is also 
the sign bit), only fractional numbers (numbers with absolute values less 
than 1 ), can be represented. In other words, there is a number with 31 frac- 
tional bits called a Q31 . All operations assume that the binary point is fixed 
at this location. The fixed-point system, although simple to implement in 
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hardware, imposes limitations in the dynamic range of the represented 
number. This causes scaling problems in many applications. You can avoid 
this difficulty by using floating-point numbers. 

A floating-point number consists of a mantissa m multiplied by base b raised 
to an exponent e: 

m*b* 

In current hardware implementations, the mantissa is typically a normalized 
number with an absolute value between 1 and 2, and the base is b = 2. Al- 
though the mantissa is represented as a fixed-point number, the actual val- 
ue of the overall number floats the binary point because of the multiplication 
by b e . The exponent e is an integer whose value determines the position of 
the binary point in the number. IEEE has established a standard format for 
the representation of floating-point numbers. 

To achieve higher efficiency in the hardware implementation, the 'C40 uses 
a floating-point format that differs from the IEEE standard. However, 'C40 
has two single-cycle instructions, TOIEEE and FRIEEE, for the format con- 
version. These two instructions can also be used with the STF instruction, 
which allows the data format to be converted within memory to memory 
transfer. This subsection describes briefly the two formats and presents an 
example program to convert between them. 

TMS320C40 floating-point format: 

8 bits 1 23 bits 



e 



f 



In a 32-bit word representing a floating-point number, the first 8 bits corre- 
spond to the exponent expressed in twos-complement format. One bit is for 
sign, and 23 bits are for the mantissa. The mantissa is expressed in twos- 
complement form with the binary point after the most significant nonsign bit. 
Since this bit js the complement of the sign bit s, it is suppressed. In other 
words, the mantissa actually has 24 bits. One special case occurs when 
e = -128. In this case, the number is interpreted as zero, independently of 
the values of s and f (which are by default set to zero). To summarize, the 
values of the represented numbers in the 'C40 floating-point format are as 
follows: 

2e*(01.f) if s = 0 
2e*(l0.f) ifs=1 
0 if e = -128 
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IEEE floating-point format: 

1 8 bits 23 bits 



e 



f 



The IEEE floating-point format uses sign-magnitude notation forthe mantis- 
sa, and offset by 1 27 for the exponent. In a 32-bit word representing a floa- 
ting-point number, the first bit is the sign bit. The next 8 bits correspond to 
the exponent, expressed in an offset-by-127 format (the actual exponent is 
e-1 27). The following 23 bits represent the absolute value of the mantissa 
with the most significant 1 implied. The binary point is after this most signifi- 
cant 1 . In other words, the mantissa actually has 24 bits. There are several 
special cases, summarized below. 

These are values of the represented numbers in the IEEE floating-point for- 
mat: 

(-1)s* 2*-1 27 * (01 .f) jf o < e < 255 

Special cases: 

H) s * 0.0 if e = 0 and f = 0 (zero) 

(-1)s* 2~ 1 26 * (o.f) if e = 0 and f <> 0 (denormalized) 

(-1 ) s * infinity if e = 255 and f = 0 (infinity) 

NaN (not a number) if e = 255 and f <> 0 

Based on these definitions of the formats, 'C40 has developed the hardware 
to do the conversion. It assumes that the source data for the IEEE format 
is in memory only and that for the 'C40 floating-point format, the source data 
is in either memory or an extended-precision register. The destination for 
both conversions must be in an extended-precision register. In the case of 
block memory transfer, the no penalty data format conversion can be 
achieved by parallel instruction with STF. Example 12-26 and 
Example 1 2-27 show the data format conversion within the data transfor- 
mation between communication port and internal RAM. 
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Example 12-26. IEEE to TMS320C40 Conversion Within Block Memory Transfer 

* TITLE IEEE TO TMS320C40 CONVERSION WITHIN BLOCK MEMORY 

* TRANSFER 
* 

* PROGRAM ASSUMES THAT THE INPUT FIFO OF COMMUNICATION PORT 0 

* ARE FULL OF IEEE FORMAT DATA. EIGHT DATA ARE TRANSFERRED FROM 

* COMMUNICATION PORT 0 TO INTERNAL RAM BLOCK 0 AND THE DATA 

* FORMAT ARE CONVERTED FROM IEEE FORMAT TO TMS320C40 FLOATING- 

* POINT FORMAT. 



LDI 
LDI 

FRIEEE 

RPTS 
FRIEEE 
I | STF 

STF 



8CP0_IN,AR0 

@RAM0, AR1 
*AR0 / R0 



*AR0,R0 
R0,*AR1++(1) 

R0,*AR1++(1) 



Load comm. port 0 input Fifo 
address 

Load internal RAM block 0 address 
Convert first data 



Convert next data 
Store previous data 



Store last data 



Example 12-27. TMS320C40 to IEEE Conversion Within Block Memory Transfer 

* TITLE TMS320C40 TO IEEE CONVERSION WITHIN BLOCK MEMORY 

* TRANSFER 

* PROGRAM ASSUMES THAT THE OUTPUT FIFO OF COMMUNICATION PORT 0 

* IS EMPTY. EIGHT DATA ARE TRANSFERRED FROM INTERNAL RAM BLOCK 0 

* TO COMMUNICATION PORT 0 AND THE DATA FORMAT ARE CONVERTED FROM 

* TMS320C40 FLOATING-POINT FORMAT TO IEEE FORMAT. 



I I 



LDI 
LDI 

TOIEEE 

RPTS 

TDIEEE 

STF 



6CP0_OUT,AR0 
@RAM0, AR1 
*AR1++(1) ,R0 

6 

*AR1++(1) ,R0 
R0,*AR0 



Load comm. port 0 output Fifo address 
Load internal RAM block 0 address 
Convert first data 



Convert next data 
Store previous data 



STFRO, *AR0 



Store last data 
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12.4 Application-Oriented Operations 

Certain features of the 'C40 architecture and instruction set facilitate the so- 
lution of numerically intensive problems. This section presents examples 
of applications that use these features, such as companding, filtering, matrix 
arithmetic, and fast Fourier transforms (FFT). 

12.4.1 Companding 

In the area of telecommunications, one of the primary concerns is to 
conserve the channel bandwidth and, at the same time, to preserve high 
speech quality. This is achieved by quantizing the speech samples 
logarithmically. It has been demonstrated that an 8-bit logarithmic quantizer 
produces speech quality equivalent to a 13-bit uniform quantizer. The 
logarithmic quantization is achieved by companding 
(COMpress/exPANDing). Two international standards have been 
established for companding: the fx-law (used in the United States and 
Japan), and the A-law (used in Europe). Detailed descriptions of n-law and 
A-law companding are presented in an application report on companding 
routines included in the book Digital Signal Processing Applications with the 
TMS320 Family (literature number SPRA01 2A). 

During transmission, logarithmically compressed data in sign-magnitude 
form are transmitted along the communications channel. If any processing 
is necessary, these data should be expanded to a 1 4-bit (for n-law) or 1 3-bit 
(for A-law) linear format. This operation occurs when data is received at the 
digital signal processor. After processing, and in order to continue 
transmission, the result is compressed back to 8-bit format and transmitted 
through the channel. 

Example 12-28 and Example 12-29 show |i-law compression and 
expansion (i.e., linear to |i-law and n-law to linear conversion), while 
Example 12-30 and Example 12-31 show A-law compression and 
expansion. For expansion, using a look-up table is an alternative approach. 
It trades memory space for speed of execution. Since the compressed data 
is 8 bits long, a table with 256 entries can be constructed to contain the 
expanded data. If the compressed data is stored in the register ARO, the 
following two instructions will put the expanded data in register R0: 

ADD I @TABL, ARO ; @TABL = BASE ADDRESS OF TABLE 
LDI *AR0,R0 ; PUT EXPANDED NUMBER IN R0 

The same look-up table approach could be used for compression, but the 
required table length would then be 1 6,384 words for ^i-law or 8,1 92 words 
for A-law. If this memory size is not acceptable, the subroutines presented 
in Example 12-28 or Example 12-30 should be used. 
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Example 12-28. \i-Law Compression 



MUCMPR 



TITLE |1-LAW COMPRESSION 

SUBROUTINE MUCMPR 

TYPICAL CALLING SEQUENCE: 
LAJU MUCMPR 
LDI v, RO 

NOP < can be other non-pipeline-break 

NOP < instructions 

ARGUMENT ASSIGNMENTS: 

ARGUMENT | FUNCTION 



RO 



| v = NUMBER TO BE CONVERTED 



REGISTERS USED AS INPUT: RO 
REGISTERS MODIFIED: RO, Rl 
REGISTER CONTAINING RESULT: RO 



CYCLES: 15 



WORDS: 15 



.global 


MUCMPR 




LSH3 


-6,R0,R1 


; Save sign of number 


ABSI 


R0,R0 




CMP I 


1FDEH,R0 


If R0>0xlFDE, 


LDIGT 


1FDEH, RO 


; saturate the result 


ADD I 


33, RO 


; Add bias 


FLOAT 


RO 


; Normalize: (seg+5) OWXYZx . . . x 


MPYF 


0. 03125, RO 


; Adjust segment number by 2** (-5) 


LSH 


1,R0 


(seg)WXYZx. . .x 


PUSHF 


RO 




POP 


RO 


; Treat number as integer 


LSH 


-20, RO 


; Right- justify 


BUD 


Rll 


; Delayed return 


AND 


080H,R1 


; Set sign bit 


ADD I 


R1,R0 


; RO = compressed number 


NOT 


RO 


; Reverse all bits for transmission 
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Example 12-29. \i-Law Expansion 



*TITLE 
★ 

* 

* 
* 



MUXPND 



*|A-LAW EXPANSION' 

SUBROUTINE MUXPND 

TYPICAL CALLING SEQUENCE: 
LAJU MUXPND 
LDI v, RO 

NOP < can be other non-pipeline-break 

NOP < instructions 

ARGUMENT ASSIGNMENTS: 

ARGUMENT | FUNCTION 

RO I v = NUMBER TO BE CONVERTED 

REGISTERS USED AS INPUT: RO 
REGISTERS MODIFIED: RO, Rl, R2 
REGISTER CONTAINING RESULT: RO 

CYCLES: 14 (WORST CASE) WORDS: 14 



Complement bits 

Isolate quantization bin 

Add bias to introduce lxxxxl 

Isolate segment code 
Test sign 

if positive, delayed return 

Shift and put result in RO 

Subtract bias 

Delayed return 

Negate if a negative number 



. global 


MUXPND 


NOT 


RO, RO 


AND 3 


0FH,R0,R1 


LSH 


1,R1 


ADD I 


33, Rl 


LSH3 


-4,R0 


TSTB 


08H,R0 


BZD 


Rll 


AND 


7,R0 


LSH3 


R0,R1,R0 


SUBI 


33, RO 


BUD 


Rll 


NEGI 


RO 


NOP 




NOP 
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Example 12-30. A-Law Compression 



* 
* 
* 

★ 
* 

★ 
* 
* 
* 
* 
* 

* 
* 

ACMPR 



END 



TITLE A-LAW COMPRESSION 

SUBROUTINE ACMPR 

TYPICAL CALLING SEQUENCE: 
LA J ACMPR 
LDIv, RO 

NOP< can be other non-pipeline-break 

NOP< instructions 

ARGUMENT ASSIGNMENTS: 
ARGUMENT | FUNCTION 

+ 

RO | v - NUMBER TO BE CONVERTED 

REGISTERS USED AS INPUT: RO 
REGISTERS MODIFIED: RO, Rl 
REGISTER CONTAINING RESULT: RO 



CYCLES : 


17 WORDS: 


17 


.global 


ACMPR 




LSH3 


-5,R0,R1 


; Save sign of number 


ABSI 


R0,R0 




CMP I 


1FH,R0 


If R0<0x20, 


BLED 


END 


; Do linear coding 


CMP I 


OFFFH, RO 


If R0>0xFFF, 


LDIGT 


0FFFH,R0 


; saturate the result 


LSH 


-1,R0 


; Eliminate rightmost bit 


FLOAT 


RO 


Normalize: (seg+3) OWXYZx . . . x 


MPYF 


0.125,R0 


; Adjust segment number by 2** (-3) 


LSH 


1, RO 


(seg) WXYZx. . .x 


PUSH 


FRO 




POP 


RO 


; Treat number as integer 


LSH 


-20, RO 


; Right- justify 


BUD 


Rll 


; Delayed return 


AND 


080H,R1 


; Set sign bit 


ADD I 


R1,R0 


; RO = compressed number 


XOR 


0D5H,R0 


/ Invert even bits for 






; transmission 
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Example 12-31. A-Law Expansion 



* 

* 
* 
* 

* 

* 
* 

AXPND 



SKIP1 



TITLE A-LAW EXPANSION 

SUBROUTINE AXPND 

TYPICAL CALLING SEQUENCE: 
LAJU AXPND 
LDI v, RO 

NOP < can be other non-pipeline-break 

NOP < instructions 

ARGUMENT ASSIGNMENTS: 

ARGUMENT | FUNCTION 

+ . 

RO I v - NUMBER TO BE CONVERTED 

REGISTERS USED AS INPUT: RO 
REGISTERS MODIFIED: RO, Rl, R2 
REGISTER CONTAINING RESULT: RO 

CYCLES: 19 (WORST CASE) WORDS: 16 



Invert even bits 

Store for bit sign 
Isolate segment code 

Isolate quantization bin 

Create Oxxxxl 
Or Ixxxxl 

Shift and put result in RO 
Test sign bit 

If positive, delayed return and 
annul next three instructions 
Negate if a negative number 



.global 


AXPND 


XOR 


0D5H / R0 / R2 


ASH 3 


-4, R2,R0 


AND 


7,R0 


BZD 


SKIP1 


AND 3 


0FH,R2,R1 


LSH 


1,R1 


ADDI 


1,R1 


ADD I 


32, Rl 


SUB I 


1,R0 


LSH3 


R0,R1,R0 


TSTB 


80H,R2 


BZAT 


Rll 


NEGI 


RO 


NOP 




NOP 




BU 


Rll 



Return 
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12.4.2 FIR, MR, and Adaptive Filters 

Digital filters are a common requirement for digital signal processing sys- 
tems. There are two types of digital filters: finite impulse response (FIR) and 
infinite impulse response (IIR). Each of these types<;an have either fixed or 
adaptable coefficients. In this section, the fixed-coefficient filters are pres- 
ented first, and then the adaptive filters are discussed. 

12.4.2.1 FIR Filters 

If the FIR filter has an impulse response h[0], h[1 ],..., h[N-1], and x[n] repre- 
sents the input of the filter at time n, the output y[n] at time n is given by this 
equation: 

y[n] = h[0] x[n] + h[1]x[n-1] + ... + h[N-1] x[n-(N-1)] 

Two features of the 'C40 that facilitate the implementation of the FIR filters 
are parallel multiply/add operations and circular addressing. The first one 
permits the performance of a multiplication and an addition in a single ma- 
chine cycle, while the second one makes a finite buffer of length N sufficient 
for the data x. 

Figure 12-1 shows the arrangement of the memory locations in order 
to implement circular addressing, while Example 12-32 presents the 
'040 assembly code for an FIR filter. 

Figure 12-1. Data Memory Organization for an FIR Filter 



low 
address 



impulse 
response 



h(N -1) 



h(N-2) 



oldest input 



high 
address 



h(D 



h(0) 



newest input 



initial 


final 


input samples 


input samples 


x[n-(N-1)] 




x(n) 


x[n-(N-2)] 




x[n-(N-1)] 


• 


• 


• 


• 


• 


• 


x(n-1) 




x(n-2) 


x(n) 




x(n-1) 



circular 
queue 



12-51 



12 



Applications-Oriented Operations — FIR, I I R, Adaptive Filters 



In order to set up circular addressing, initialize the block-size register BK to 
block length N. Also, the locations for signal x should start from a memory 
location whose address is a multiple of the smallest power of 2 that is greater 
than N. For instance, if N = 24, the first address for x should be a multiple 
of 32 (the lower 5 bits of the beginning address should be zero). To under- 
stand this requirement, look at Section 5.3 on page 5-25, Circular Address- 
ing. 

In Example 1 2-32, the pointer to the input sequence x is incremented and 
assumed to be moving from an older input to a newer input. At the end of 
the subroutine, AR1 will point to the position for the next input sample. 

Example 12-32. FIR Filter 
* 

* TITLE FIR FILTER 

* SUBROUTINE FIR 
★ 

* EQUATION: y(n) = h(0) * x(n) + h(l) * x(n-l) + 

* . . . + h(N-l) * x(n-(N-l) ) 

* TYPICAL CALLING SEQUENCE: 
★ 

* LOAD ARO 

* LAJU FIR 

* LOAD AR1 

* LOAD RC 

* LOAD BK 

* ARGUMENT ASSIGNMENTS: 

* ARGUMENT | FUNCTION 

* 

* ARO | ADDRESS OF h(N-l) 

* AR1 | ADDRESS OF x(N-l) 

* RC | LENGTH OF FILTER - 2 (N-2) 

* BK | LENGTH OF FILTER (N) 
* 

* REGISTERS USED AS INPUT: ARO, AR1, RC, BK 

* REGISTERS MODIFIED: RO, R2, ARO, AR1, RC 

* REGISTER CONTAINING RESULT: RO 
★ 



12 



12-52 



Software Applications 



Applications-Oriented Operations — FIR, IIR, Adaptive Filters 



* 
* 


CYCLES : 


7 + N WORDS: 9 








FIR 


• global 


FIR 










RPTBD 


CONV 






S^tuD the i?^"D^a1" cvcle . 












Initialize R0: 




MPYF3 


*AR0++ ( 1 ) , *AR1++ ( 1 ) % , 


R0 




h(N-l) *x(n-(N-l)) ->R0 




LDF 


0 . 0, R2 






Initialize R2 . 


★ 


NOP 










* 


FILTER 


(1 <- i < N) 








CONV 


MPYF3 


*AR0++(1),*AR1++(1)%, 


R0 


/ 


h (N-l-i) *x (n- (N-l-i) ) ->R0 


1 1 


ADDF3 


R0,R2,R2 






Multiply and add operation 




BUD 


Rll 




/ 


Delayed return 




ADDF 


R0,R2,R0 






Add last product 




NOP 










* 


NOP 












end 












.end 











12.4.2.2 IIR Filters 

The transfer function of the IIR filters has both poles and zeros. Its output 
depends on both the input and the past output. As a rule, the filters need less 
computation than an FIR with similar frequency response, but the filters 
have the drawback of being sensitive to coefficient quantization. Most often, 
the IIR filters are implemented as a cascade of second-order sections, 
called biquads. Example 12-33 and Example 12-34 show the implementa- 
tion for one biquad and for any number of biquads, respectively. 

y[n] « a1 y[n-1] + a2 y[n-2] + bO x[n] + b1 x[n-1] + b2 x[n-2] 
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However, the following two equations are more convenient and have small- 
er storage requirements: 

d[n] m a2 d[n-2] + a1 d[n-1] + x[n] 
y[n] m b2 d[n-2] + bt d[n-1] + bO d[n] 

Figure 12-2 shows the memory organization for this two-equation ap- 
proach, an implementation of a single biquad on the 'C40. 

Figure 12-2. Data Memory Organization for a Single Biquad 



low 
address 



high 
address 



filter 
coefficients 



a2 



b2 



a1 
b1 
bO 



newest delay 



oldest delay 



newest delay 
node values 



d(n) 



d(n-1) 



d(n-2) 



newest delay 
node values 



d(n-1) 



d(n-2) 



d(n) 



circular queue 



As in the case of FIR filters, the address for the start of the values d must 
be a multiple of 4; i.e., the last two bits of the beginning address must be 
zero. The block-size register BK must be initialized to 3. 

Example 12-33. II R Filter (One Biquad) 

* TITLE IIR filter 

* SUBROUTINE IIR1 

* IIR1 == IIR FILTER (ONE BIQUAD) 

* EQUATIONS: d(n) = a2 * d(n-2) + al * d(n-l) + x(n) 

* y(n) = b2 * d(n-2) + bl * d(n-l) + bO . * d(n) 

* OR y(n) = al*y(n-l) + a2*y(n-2) + bO*x(n) + bl*x(n-l) 

+ b2*x(n-2) 

* 
★ 

* TYPICAL CALLING SEQUENCE: 
★ 

* load R2 

* LAJU IIR1 

* load ARO 

* load AR1 

* load BK 
* 
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* ARGUMENT ASSIGNMENTS: 

* ARGUMENT | FUNCTION 

* + 

* R2 | INPUT SAMPLE X(N) 

* ARO | ADDRESS OF FILTER COEFFICIENTS (A2) 



* 


AR1 


| ADDRESS OF 


DELAY 


MODE 


VALUES (D(N-2)) 


k 


BK 


I BK = 3 












REGISTERS USED AS INPUT: 


R2, 


ARO, 


AR1, BK 




REGISTERS MODIFIED: 


RO, 


Rl, 


R2, 


ARO, AR1 


★ 

k 


REGISTER CONTAINING RESULT : 


RO 






k 
k 


CYCLES : 


8 WORDS: 8 










k 
k 


. global 


IIRl 










IIRl 


MPYF3 


*AR0, *AR1, RO 






/ 


a2 * d(n~2) -> RO 


* 


MPYF3 


*++AR0 (1) , *AR1- 


"(1)%/ 


•o 1 

Rl 


/ 


b2 * a(n-2) — > Rl 




MPYF3 


*++AR0 (1) , *AR1 , 


RO 




/ 


al * d(n-l) -> RO 


1 1 
k 


ADDF3 


R0,R2,R2 






/ 


a2*d (n-2 ) +x (n) -> R2 




MPYF3 


*++AR0 (1) , *AR1- 


-(D%, 


RO 


/ 


bl * d(n-l) -> RO 


1 1 


ADDF3 


R0,R2,R2 






/ 


al*d(n-l)+a2*d(n-2) 


* 










/ 


+x(n) -> R2 


* 


BUD 


Rll 






/ 


Delayed return 




MPYF3 


*++AR0 (1) ,R2,R2 






/ 


bO * d(n) -> R2 


1 1 


STF 


R2,*AR1++(1)% 






/ 


Store d(n) and point 


k 










/ 


d(n-l) 




ADDF 


RO, R2 






/ 


bl*d(n-l)+bO*d(n) -> 




ADDF 


R1,R2, RO 






/ 


b2*d(n-2)+bl*d(n-l) 


k 












+bO*d(n) -> RO 


k 


end 













k 



. end 

In the more general case, the IIR filter contains N>1 biquads. The equations 
for its implementation are given by the following pseudo-C language code: 

y[0,n] = x[n] 
for (i=0; i<N; 

d[i,n] = a2[i] d[i,n-2] + a1[i]d[i,n-1] + y[i-1,n] 

y[i,n] = b2[i] d[i-2] + b1[i] d[i,n-1] + bO[i] d[i f n] 

} 

y[n] = y[N-1,n] 

Figure 12-3 shows the corresponding memory organization, while 
Example 12-34 shows the 'C40 assembly-language cpde. 
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Figure 12-3. Data Memory Organization for N Biquads 
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empty 
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high 
address 



a2(N -1) 
b2(N -1) 
a1(N -1) 
b1(N -1) 
bO(N -1) 



d(N -1,n) 



d(N -1, n-1) 
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empty 
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The block size register BK should be initialized to 3, and the beginning of 
each set of d values (i.e., d[i,n], i = 0...N-1) should be at an address that 
is a multiple of 4 (the last two bits zero), as stated in the case of a single bi- 
quad. 

Example 12-34. IIR Filters (N > 1 Biquads) 

* 

* TITLE IIR FILTERS (N > BIQUADS) 
* 

* SUBROUTINE IIR2 

* EQUATIONS: y(0,n) - x(n) 

* FOR (i - 0; i < N; i++) 

* { 

* d(i,n) - a2(i) * d(i,n-2) + al(i) * d(i,n-l) * y(i-l,n) 

* y(i,n) = b2(i) * (1(1,11-2) + bl(i) * (1(1,11-1) * b0(i) * d(i,n) 

* } 

* y(n) « y(N-l,n) 

* TYPICAL CALLING SEQUENCE: 
* 

* load R2 

* load ARO 

* load AR1 

* load IRQ 
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* 
★ 

* 

★ 
★ 

* 

★ 
★ 

★ 
* 

IIR2 



I I 



I I 



LAJU 
load 
load 
load 



IIR2 
IR1 
BK 
RC 



ARGUMENT ASSIGNMENT: 
ARGUMENT | FUNCTION 



R2 

ARO 

AR1 

BK 

IRO 

IR1 

RC 



INPUT SAMPLE x(n) 

ADDRESS OF FILTER COEFFICIENTS (a2 (0) ) 
ADDRESS OF DELAY NODE VALUES (d (0,11-2)) 
BK = 3 
IRO = 4 
IR1 = 4*N-4 

NUMBER OF BIQUADS (N) -2 



REGISTERS USED AS INPUT; R2, ARO, AR1, IRO, IR1, BK, RC 
REGISTERS MODIFIED; R0, Rl, R2, ARO, AR1, RC 
REGISTERS CONTAINING RESULT: RO 

CYCLES: 4 + 6N WORDS: 15 



I I 



.global IIR2 

MPYF3 *AR0,*AR1,R0 

MPYF3 *AR1++(1) ,*AR1-(1)%,R1 

RPTBD LOOP 

MPYF3 *++AR0 (1) ,*AR1,R0 

ADDF R0,R2,R2 



MP YF3 *++AR0 ( 1 ) , *AR1- ( 1 ) % , RO 

ADDF 3 R0,R2,R2 

MP YF3 *++AR0 ( 1 ) , R2 , R2 

STF R2,*AR1-(1)% 



LOOP STARTS HERE 

MP YF3 *++AR0 ( 1 ) , *++ARl ( IRO ) , RO 
ADDF 3 R0,R2,R2 



MPYF3 *++AR0 ( 1 ) , *AR1- ( 1 ) % , Rl 
ADDF 3 R1,R2,R2 

MPYF3 *++AR0(l) ,*AR1,R0 



a2(0) * d(0,n-2) -> RO 
b2(0) * d(0,n-2) -> Rl 

Set loop for 1 <= i < n 

al(0) * D(0,n-1) -> RO 
First sum term 
of d(0,n) . 

bl (0) ■* d(0,n-l) -> RO 
Second sum term 
of d(0,n) . 

b0(0) * d(0,n) -> R2 
Store d(0,n) ; Point to 
d(0,n-2) 



a2(i) * d(i,n-2) -> RO 
First sum term 
of y (i-1, n) . 

Pipeline hit on previous 
instruction 

b2(i) * D(i,n-2) -> Rl 

Second sum term 

of y (i-l,n) . 

al(i) * d(i,n-l) -> RO 
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ADDF3 



R0,R2,R2 



First sum term 
of d(i,n) . 



I I 

LOOP 



MPYF3 *++AR0 ( 1 ) , *AR1- ( 1 ) % , RO 
ADDF3 R0,R2,R2 



MPYF3 *++AR0 ( 1 ) , R2 , R2 
STF R2, *AR1-(1)% 



FINAL SUMMATION 



BRD 

ADDF 

ADDF3 

LDI 
LDI 

end 
.end 



Rll 

R0,R2 

R1,R2,R0 

*AR1— (IR1) ,R1 
*AR1— ■(!)%, R2 



bl(i) * d(i,n-l) -> RO 

Secondsumterm 

of d(i,n) . 

bO(i) * d(i,n) -> R2 
Store d(i,n) 
point to d(i,n-2) 



Delayed return 

First sum term 

of y (n-1, n) 
Second sum term of 
y (n-l,n 

Return to first biquad 
Point to d(0,n-l) 



12.4.2.3 Adaptive Filters (LMS Algorithm) 

In some applications in digital signal processing, a filter must be adapted 
over time to keep track of changing conditions. The book Theory and Design 
of Adaptive Filters by Treichler, Johnson, and Larimore (Wiley-lnterscience, 
1987) presents the theory of adaptive filters. Although in theory, both FIR 
and IIR structures can be used as adaptive filters, the stability problems and 
the local optimum points that the I IR filters exhibit make them less attractive 
for such an application. Hence, until further research makes IIR filters a bet- 
ter choice, only the FIR filters are used in adaptive algorithms of practical 
applications. 

In an adaptive FIR filter, the filtering equation takes this form: 

y[n] = h[n,0]x[n] + h[n,1]x[n-1]+...+ h[n,N-1]x[n-(N-1)] 

The filter coefficients are time-dependent. In a least-mean-squares (LMS) 
algorithm, the coefficients are updated by an equation in this form: 

h[n+1,i] = h[n,1] + bx[nH], i=0, 1,...,N-1 
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b is a constant for the computation. The updating of the filter coefficients 
can be interleaved with the computation of the filter output so that it takes 
3 cycles per filter tap to do both. The updated coefficients are written over 
the old filter coefficients. Example 12-35 shows the implementation of an 
adaptive FIR filter on the '040. The memory organization and the position- 
ing of the data in memory should follow the same rules as the above Fl R filter 
with fixed coefficients. 

Example 12-35. Adaptive FIR Filter (LMS Algorithm) 

* TITLE ADAPTIVE FIR FILTER (LMS ALGORITHM) 

* SUBROUTINE LMS 

* LMS == LMS ADAPTIVE FILTER 
* 

* EQUATIONS: y(n) = h(n,0)*x(n) + h (n, 1) *x (n-1) + ... 

* + h(n,N-l) *x(n-(N-l) ) 

* FOR (i = 0; i < N; i++) h(n+l,i) = h(n,i) 

* + tmuerr * x(n-i) 
* 

* TYPICAL CALLING SEQUENCE: 

* load R4 

* load ARO 

* LAJU LMS 

* load AR1 

* load RC 

* load BK 

* ARGUMENT ASSIGNMENTS: 

* ARGUMENT | FUNCTION 



* + 

* R4 | SCALE FACTOR (2 * mu * err) 

* ARO | ADDRESS OF h(n,N-l) 

* AR1 | ADDRESS OF x(n-(N-l)) 

* RC | LENGTH OF FILTER - 2 (N-2) 

* BK | LENGTH OF FILTER (N) 



* REGISTERS USED AS INPUT: R4, ARO, AR1, RC, BK 

* REGISTERS MODIFIED: R0, Rl, R2, ARO, AR1, RC 

* REGISTER CONTAINING RESULT: RO 

* PROGRAM SIZE: 12 words 

* EXECUTION CYCLES: 6 + 3N 
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* 


SETUP (i 


- 0) 




LMS 
* 

1 1 


.global 
RPTBD 

MPYF3 
SUBF3 


LMS 
LOOP 

*AR0, *AR1,R0 
R2, R2, R2 


; Setup the delayed repeat block 
; Initialize R0 : 
; h(n,N-l) * x(n-(N-l)) -> R0 
; Initialize R2 




MPYF3 
ADDF3 


*AR1++(1) %,R4,R1 
*AR0++(1) , Rl, Rl 


; Initialize Rl: 
/ x(n-(N-l)) * tmuerr -> Rl 
; h(n,N-l) + x(n-(N-l)) * 
/ tmuerr -> Rl 




FILTER AND UPDATE (1 <= I 


< N) 
; Filter: 



1 1 



LOOP 



MPYF3 *AR0— ( 1 ) , *AR1 , R0 ; 
ADDF3 R0 r R2,R2 



MPYF3 *AR1++(1) %,R4,R1 
STF Rl f *AR0++(l) 

ADDF3 *AR0++ ( 1 ) , Rl , Rl 



h(n,N-l-i) * x(n-(N.-l-i) ) -> R0 
Multiply and add operation. 

UPDATE : 

x(n,N-(N-l-i) ) * tmuerr -> Rl 
Rl -> h(n+l,N-l~(i-l) ) 

h(n # N-l-i) + x(n-(N-l-i) ) 
*tmuerr -> Rl 



BUD 

ADDF3 
STF 

NOP 

end 

. end 



Rll 

R0 / R2 / R0 
Rl, *-AR0(l) 



Delayed return 

Add last product . 
h(n,0) + x(n* tmuerr -> 
h(n+l ,0) 
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12.4.3 Matrix- Vector Multiplication 

In matrix-vector multiplication, a K x N matrix of elements m(i,j), having K 
rows and N columns, is multiplied by an N x 1 vector to produce a K x 1 re- 
sult. The multiplier vector has elements v(j), and the product vector has ele- 
ments p(i). Each one of the product-vector elements is computed by the fol- 
lowing expression: 

p(i) = m(i,0) v(0) + m(i,1) v(1) +...+ m(i,N-1) v(N-1) i = 0,1,...,K-1 

This is essentially a dot product, and the matrix-vector multiplication con- 
tains, as a special case, the dot product presented in Example 1 2-2 on page 
12-10 and Example 1 2-3 on page 12-12. In pseudo-C format, the computa- 
tion of the matrix multiplication is expressed by 

for (i m 0; i< K; i++) { 
P('0 = O 

for (j = 0; j < N; j++) 
p(i) = p(i) + m(i,j) * v(j) 

} 

Figure 1 2-4 shows the data memory organization for matrix-vector multipli- 
cation, and Example 12-36 shows the 'C40 assembly code to implement 
it. Note that in Example 12-36, K (number of rows) should be greater than 
0, and N (number of columns) should be greater than 1 . 

Figure 12-4. Data Memory Organization for Matrix- Vector Multiplication 



low 
address 



high 
address 



matrix storage 

m(0, 0) 

m(0,N-1) 
md.O) 
m(1,1) 



input 
vector storage 

v(0) 
v(D 



l v(N-1) I 



result 
vector storage 

P(0) 

m 



P(K-1) 
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Example 12-36. Matrix Times a Vector Multiplication 



TITLE MATRIX TIMES A VECTOR MULTIPLICATION 
SUBROUTINE MAT 

MAT == MATRIX TIMES A VECTOR OPERATION 



* 


TYPICAL CALLING SEQUENCE: 




load 


ARO 


* 


load 


AR1 




load 


AR2 


* 


load 


AR3 




load 


Rl 


★ 


CALL 


MAT 


* 


ARGUMENT ASSIGNMENTS: 


* 
★ 


ARGUMENT | FUNCTION 


★ 


ARO 


I ADDRESS OF M(0,0) 


* 


AR1 


| ADDRESS OF V(0) 


* 


AR2 


I ADDRESS OF P (0) 




AR3 


I NUMBER OF ROWS - 1 (K-l) 


* 


Rl 


| NUMBER OF COLUMNS - 2 (N-2) 


* 


REGISTERS 


USED AS INPUT: ARO, AR1, AR2 , AR3, Rl 




REGISTERS 


MODIFIED: R0, R2, ARO, AR1, AR2 , AR3 , IRO, 




PROGRAM SIZE: 11 


* 


EXECUTION 


CYCLES: 5 + 7K + KN = 5 + ( (N-l). + 8 ) * K 




.global 


MAT 


* 


SETUP 




MAT 


ADD 1 3 


R1,2,IR0 / IRO = N 


* 


FOR (i = 


0; i < K; . i++) LOOP OVER THE ROWS. 


ROWS 


RPTBD 


Dot ; Setup mulitply a 






; column . 




LDI 


R1,RC ; Set loop counter 




LDF 


0.0, R2 ; Initialize R2 




MPYF3 


*AR0++(1) ,*AR1++(1) ,R0 ; 111(1,0) * v(0) -> 
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* 

DOT 
I I 



FOR (j = 1; j < N; j++) DO DOT PRODUCT OVER COLUMNS 



MPYF3 
ADDF3 



DBD 



*AR0++(1),*AR1++(1),R0 ; m(i,j) *v(j) -> RO 
R0,R2,R2 ; m(i,j-l) *v(j-l) + 

/ R2 -> R2. 



AR3,ROWS 



R0,R2 

R2,*AR2++(1) 
* — AR1 (IRO) 



counts the number of rows 
left. 



ADDF 
STF 
NOP 

! ! ! DELAYED BRANCH HAPPENS HERE ! ! ! 
RETURN SEQUENCE 

RETS ; return 

end 

.end 



last accumulate, 
result -> p(i) 
set AR1 to point to v(0) 



12-4.4 Fast Fourier Transforms (FFT) 

Fourier transforms are an important tool often used in digital signal process- 
ing systems. The purpose of the transform is to convert information from the 
time domain to the frequency domain. The inverse Fourier transform con- 
verts information back to the time domain from the frequency domain. Im- 
plementation of Fourier transforms that are computationally efficient are 
known as fast Fourier transforms (FFTs). The theory of FFTs can be found 
in books such as DFT/FFTand Convolution Algorithms by C.S. Burrus and 
T.W. Parks (John Wiley, 1985), and in the book Digital Signal Processing 
Applications with the TMS320 Family. 

Certain 'C40 features that increase efficient implementation of numerically 
intensive algorithms are particularly well-suited for FFTs. The high speed 
of the device (40-ns cycle time) makes the implementation of realtime algo- 
rithms easier, while the floating-point capability eliminates the problems as- 
sociated with dynamic range. The powerful indexing scheme in indirect ad- 
dressing facilitates the access of FFT butterfly legs that have different 
spans. The repeat block implemented by the RPTB or RPTBD instruction 
reduces the looping overhead in algorithms heavily dependent on loops 
(such as the FFTs). This construct gives the efficiency of in-line coding but 
has the form of a loop. Since the output of the FFT is in scrambled (bit-re- 
versed) order when the input is in regular order, it must be restored to the 
proper order. This rearrangement does not require extra cycles. The device 
has a special form of indirect addressing (bit-reversed addressing mode) 
that can be used when the FFT output is needed. 
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The 'C40 can implement this mode on either the CPU or DMA. This mode 
permits accessing the FFT output in the proper order. If the DMA transfer 
with bit-reversed addressing mode is used, there is no overhead for data in- 
put and output. 

There are several types of FFTs: 

□ Radix-2 and radix-4 algorithms depending on the size of the FFT 
butterfly 

□ Decimation in time or frequency (DIT or DIF) 

□ Complex or real FFTs 

□ FFTs of different lengths, etc. 

The examples in this section of FFT implementation are based on programs 
contained in the application report, "An Implementation of FFT, DOT, and 
Other Transforms on the TMS320C30", by Panos Papamichalis in the Digi- 
tal Signal Processing Applications with the TMS320 Family, 
volume III. 

Example 12-37 and Example 12-38 show the implementation of a complex 
radix-2, DIF FFT on the 'C40. Example 1 2-37 contains the generic code of 
the FFT that can be used with any length number. However, for the complete 
implementation of an FFT, a table of twiddle factors (sines/cosines) is need- 
ed, and this table depends on the size of the transform. To retain the generic 
form of Example 1 2-37, the table with the twiddle factors (containing 1 -1/4 
complete cycles of a sine) is presented separately in Example 1 2-38 for the 
case of a 64-point FFT. A full cycle of a sine should have a number of points 
equal to the FFT size. In Example 12-38, the FFT length N and M, which 
is equal to the logarithm of N to base equal to the radix, are defined. M is 
the number of stages of the FFT. For a 64-point FFT, M = 6 when using a 
radix-2 algorithm, or M = 3 when using a radix-4 algorithm. If the table with 
the twiddle factors and the FFT code are kept in separate files, they should 
be connected at link time. 
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Example 12-37. Complex, Radix-2, D IFF FT 
* 

* TITLE COMPLEX, RADIX-2, DIF FFT 

* GENERIC PROGRAM FOR A LOOPED-CODE RADIX-2 FFT COMPUTATION 

* IN 320C40 
* 

* THE PROGRAM IS DERIVED FROM THE BURRUS AND PARKS 

* BOOK, "DFT/FFT AND CONVOLUTION ALGORITHMS", PAGE 111. 

* THE (COMPLEX) DATA RESIDE IN INTERNAL MEMORY. 

* THE COMPUTATION IS DONE IN-PLACE, BUT THE 

* RESULT IS MOVED TO ANOTHER MEMORY SECTION TO 

* DEMONSTRATE THE BIT-REVERSED ADDRESSING. 

* THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE PUT IN A 

* .DATA SECTION. THIS DATA IS INCLUDED IN A SEPARATE 

* FILE TO PRESERVE THE GENERIC NATURE OF THE PROGRAM. 



* 


FOR THE 


SAME PURPOSE, THE SIZE OF THE FFT N AND 


* 


LOG2 (N) 


ARE DEFINED IN 


A .GLOBL DIRECTIVE AND 




SPECIFIED DURING LINKING. 




.globl 


FFT , 


Entry point for execution 




.globl 


N 


FFT size 




.globl 


M 


LOG2 (N) 




.globl 


SINE 


Address of sine table 


INP 


.usect 


"IN", 1024 ; Memory with input data 




.BSS 


OUTP,1024 ; Memory with output data 




.text 








INITIALIZE 




FFTSIZ 


.word 


N 




LOGFFT 


.word 


M 




SINTAB 


.word 


SINE 




INPUT 


.word 


INP 




OUTPUT 


.word 


OUTP 




FFT : 


LDP 


FFTSIZ 


• Command to load data page pointer 




LDI 


@FFTSIZ,R7 


• R7=N2 




LSH3 


-2,R7,IR1 


• IRl=N/4, pointer for SIN/COS table 




LDI 


@ LOGFFT, R9 


• R9 holds the remain stage number 




LSH3 


1,R7,IR0 


• IR0=2*N1 (because of real/imag) 




LDI 


1,R8 


• Initialize repeat counter of first 








• loop 




LDI 


1,AR5 \ 


• Initialize IE index (AR5=IE) 




LDI 


8 INPUT, RIO 


• RIO points to X(I) 
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* 

LOOP: 



I I 

BLK1 
I I 



INLOP : 



I I 
I I 



OUTER LOOP 

RPTBD BLK1 

LDI Rl 0 , ARO 

ADD I R7,AR0,AR2 

SUBI3 1,R8,RC 

FIRST LOOP 

ADDF *AR0,*AR2,R0 

SUBF *AR2++, *AR0++ / Rl 

ADDF *AR2,*AR0,R2 

SUBF *AR2 / *AR0,R3 

STF R2,*AR0— 

STF R3,*AR2 — 

STF RO, *AR0++ (IRO) 

STF Rl, *AR2++ (IRO) 
IF THIS IS THE LAST STAGE, 

SUBI 1,R9 

BZD END 
MAIN INNER LOOP 
LDI 
LDI 
ADD I 
ADD I 
ADD I 
RPTBD 
ADD I 
SUBI 

LDF 



2, AR1 

@SINTAB, AR4 
AR5 , AR4 
RIO, AR1, ARO 
2,AR1 
BLK2 

R7,AR0,AR2 
1,R8,RC 



*AR4,R6 



SECOND LOOP 



SUBF 
SUBF 

MPYF 

ADDF 

MPYF 

STF 

SUBF 

MPYF 

ADDF 

MPYF 

STF 



*AR2, *AR0,R2 
*+AR2, *+AR0,Rl 

R2,R6,R0 

*+AR2, *+AR0,R3 

Rl, *+AR4 (IR1) ,R3 

R3,*+AR0 

R0,R3,R4 

R1,R6,R0 

*AR2,*AR0,R3 

R2, *+AR4 (IR1) ,R3 

R3,*AR0++(IR0) 



Setup for first loop 
ARO points to X(I) 
AR2 points to X(L) 
RC shouldbeonelessthan 
desired # 



RO = 
Rl = 
R2 = 
R3 = 
Y(I) 
Y(L) 
X(I) 
X(L) 



X(I) 
X(I) 
Y(I) 
Y(I) 
= R2 
= R3 
= RO 
= Rl 



+ X(L) 

- X(L) 
+ Y(L) 

- Y(L) 
and. . . 

and. . . 
and ARO, 2 



- ARO, 2 + 2*n 



YOU ARE DONE 



Init loop counter for inner loop 

Initialize IA index (AR4=IA) 

IA=IA+IE; AR4 points to cosine 

(X(I) , Y(I) ) pointer 

Increase inner loop counter 

Setup for second loop 

(X(L) , Y(L) ) pointer 

RC should be one less than 

desired # 

R6=SIN 

R2=Xd)-X(L) 

Rl = Y(I) - Y(L) 

RO = R2*SIN and . . . 

R3 = Y(I) + Y(L) 

R3 = Rl * COS and . . . 

Y(I) = Y(I) + Y(L) 

R4 = Rl*COS - R2*SIN 

RO = R1*SIN and. . . 

R3 = X(I) + X(L) 

R3 = R2 * COS and . . . 



BLK2 
I I 



ADDF 
STF 

STF 



R0,R3,R5 
R5,*AR2++dR0) 

R4,*+AR2 



X(I) = X(I) + X(L) and ARO = ARO + 2*N1 

R5 = R2*COS + R1*SIN 

X(L) » R2*COS + R1*SIN, incr AR2 
and. . . 

Y(L) = Rl^COS ~ R2*SIN 



2 
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CMP I 


R7 , AR1 




BNEAF 


INLOP 


• Loop back to the inner loop 


AUDI 


line tv Q A 

AKO , ARH. , 


IA = IA + IE; Almoin Llauco sine 


ADD I 


R1U/AR1/ARU , 


(X(i),Y(i)) pointer 


ADD I 


2 , AR1 / 


• Increase inner loop counter 


LSH 


1,R8 


• Increment loop counter for 

• next time 


BRD 


LOOP \ 


• Next FFT stage (delayed) 


LSH 


1,AR5 


• IE = 2*IE 


LDI 


R7,IR0 


• Nl = N2 


LSH 


-1,R7 


• N2 = N2/2 



STORE RESULT OUT USING BIT-REVERSED ADDRESSING 

IRO = size of FFT - N 
RC = N - 2 



END: 


LDI 


0FFTSIZ, IRO 




SUBI3 


2, IR0,RC 




LDI 


2,IR1 




RPTBD 


BITRV 




LDI 


@ INPUT, ARO 




LDI 


©OUTPUT, AR1 




LDF 


*+AR0 (1) ,R0 


★ 


BIT 


REVERSE LOOP 




LDF 


*AR0++(IR0)B / R1 


1 1 


STF 


RO, *+ARl (1) 


BITRV 


LDF 


*+AR0 (1) ,R0 


1 1 


STF 


R1,*AR1++(IR1) 




LDF 


*AR0++(IR0)B, Rl 


1 1 


STF 


R0,*+AR1(1) 




STF 


Rl, *AR1++(IR1) 


SELF 


BR 


SELF 




. end 





Branch to itself at the end 
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Example 12-38. Table with Twiddle Factors for a 64-Point FFT 

* 

* TITLE TABLE WITH TWIDDLE FACTORS FOR A 64-POINT FFT 

* 

* FILE TO BE LINKED WITH THE SOURCE CODE FOR A 64-POINT, 

* RADIX-2 FFT. 



.globl 


SINE 


.globl 


N 




.globl 


M 




• set 


64 




. set 


o 




.data 






.float 


0. 


000000 


.float 


0. 


098017 


.float 


0. 


195090 


. float 


0. 


290285 


. float 


0. 


382683 


. float 


0. 


471397 


. float 


0. 


555570 


.float 


0. 


634393 


.float 


0. 


707107 


. float 


0. 


773010 


.float 


0. 


831470 


.float 


0. 


881921 


. float 


0. 


923880 


. float 


0. 


956940 


.float 


0. 


980785 


.float 


0. 


995185 



COSINE 

.float 1.000000 

.float 0.995185 

.float 0.980785 

.float 0.956940 

.float 0.923880 

.float 0.881921 

.float 0.831470 

.float 0.773010 

.float 0.707107 

.float 0.634393 

.float 0.555570 

.float 0.471397 

.float 0.382683 

.float 0.290285 

.float 0.195090 

.float 0.098017 

.float 0.000000 

.float -0.098017 

.float -0.195090 

.float -0.290285 

.float -0.382683 

.float -0.471397 
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.float 


-0 


.555570 


.float 


-0 


. 634393 


. float 


-0 


.707107 


. float 


-0 


.773010 


. float 


-0 


.831470 


. float 


-0 


.881921 


. float 


-0 


. 923880 


.float 


-0 


.956940 


.float 


-0 


.980785 


. float 


-0 


.995185 


.float 


-1 


.000000 


. float 


-0 


.995185 


.float 


-0 


.980785 


.float 


-0 


.956940 


.float 


-0 


.923880 


.float 


-0 


.881921 


.float 


-0 


.831470 


.float 


-0 


.773010 


. float 


-0 


.707107 


. float 


-0 


.634393 


. float 


-0 


.555570 


.float 


-0 


.471397 


.float 


-0 


.382683 


.float 


-0 


.290285 


.float 


-0 


.195090 


.float 


-0 


.098017 


.float 


0. 


000000 


. float 


0. 


098017 


.float 


0. 


195090 


. float 


0. 


290285 


.float 


0. 


382683 


.float 


0. 


471397 


.float 


0. 


555570 


.float 


0. 


634393 


. float 


0. 


707107 


. float 


0. 


773010 


.float 


0. 


831470 


.float 


0. 


881921 


.float 


0. 


923880 


. float 


0. 


956940 


. float 


0. 


980785 


. float 


0. 


995185 



The radix-2 algorithm has tutorial value because it is relatively easy to un- 
derstand how the FFT algorithm functions. However, radix-4 implementa- 
tions can increase the speed of the execution by reducing the overall arith- 
metic required. Example 12-39 shows the generic implementation of a 
complex, DIF FFT in radix-4. A companion table, like the one in 
Example 1 2-38, should have a value of M equal to the log N, where the base 
of the logarithm is four. 
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Example 12-39. Complex, Radix-4, DIF FFT 



TITLE COMPLEX, RADIX-4, DIF FFT 

GENERIC PROGRAM TO DO A LOOPED-CODE RADIX-4 FFT COMPUTATION IN 
THE TMS320C40. 

THE PROGRAM IS DERIVED FROM THE BURRUS AND PARKS BOOK, 
DFT/FFT AND CONVOLUTION ALGORITHMS, P. 117. THE 
(COMPLEX) DATA RESIDE IN INTERNAL MEMORY, AND THE 
COMPUTATION IS DONE IN-PLACE. 

THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE PUT IN A 
.DATA SECTION. THIS DATA IS INCLUDED IN A SEPARATE FILE TO 
PRESERVE THE GENERIC NATURE OF THE PROGRAM. FOR THE SAME 
PURPOSE, THE SIZE OF THE FFT N AND LOG4 (N) ARE DEFINED IN A 
. GLOBL DIRECTIVE AND SPECIFIED DURING LINKING. 

IN ORDER TO HAVE THE FINAL RESULT IN BIT-REVERSED ORDER, THE 
TWO MIDDLE BRANCHES OF THE RADIX-4 BUTTERFLY ARE INTERCHANGED 
DURING STORAGE. NOTE THIS DIFFERENCE WHEN COMPARING WITH THE 
PROGRAM IN P. 117 OF THE BURRUS AND PARKS BOOK. 





.globl 


FFT 


• Entry point for execution 




.globl 


N 


• FFT size 




.globl 


M 


• LOG4 (N) 




.globl 


SINE 


• Address of sine table 


INP 


.usect 


"IN", 1024 


• Memory with input data 




.bss 


OUTP,1024 


• Memory with output data 




.text 






* 


INITIALIZE 




FFTSIZ 


.word 


N 


. fft size 


LOGFFT 


.word 


M 


• LOG4 (FFTSIZ) 


SINTAB 


.word 


SINE 


• Sine/cosine table base 


INPUT 


.word 


INP 


• Area with input data to process 


OUTPUT 


.word 


OUTP 


• Area with output data to process 


FFT: 


LDP 


FFT , 


• Command to load data page pointer 




LDI 


@FFTSIZ,BK 




LSH3 


1,BK, IRO 




; IR0=2*N1 (because of real/imag) 




LSH3 


-2,BK,IR1 


• IRl=N/4, pointer for SIN/COS table 




LDI 


1 , AR7 


r Initialize IE index 




LDI 


1,R8 


; Initialize repeat counter of first 








; loop 




ADDI 


2, IR1,R9 


; R9 = JT = RO/2 + 2 


LSH 


-1,BK 




; R7 = N2 
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* 

LOOP: 



OUTER LOOP 
LDI 
ADD I 
ADD I 
RPTBD 
ADD I 
SUBI3 

LDF 



@ INPUT, ARO 
BK, AR0,AR1 
BK, AR1, AR2 
BLK1 

BK, AR2, AR3 
1 , R8 , RC 

*+ARl,R0 



ARO points to X(I) 
AR1 points to X(I1) 
AR2 points to X(I2) 
Setup loop BLK1 
AR3 points to X(I3) 
RC should be one less 
than desired # 
RO « Y(I1) 



* 


FIRST LOOP: BLK1 






ADDF 




ss V/T1\ 4- V/T*3\ 

t t\J — I ill; t i \ ±o ) 




ADDF 




PI ss V (T\ 4- Y U0\ 

f £\X — X \ X. f i X \ X C. } 




ADDF 


R3 . Rl . R6 


Rfi ss PI 4- P"3 




SUBF 




r X\*4 X \ X ) X \ X f 




LDF 


*AR2 . R5 

JT\X\^, , l\w 


. dc _ y /to \ 


1 1 

1 1 


STF 


R6,*+AR0 t 


r Y (I) — Rl + R3 




SUBF 


R3,Rl , 


V Rl = Rl - R3 




ADDF 


*AR3 , *AR1 , R3 t 


• R3=X(I1) + X(I3) 




ADDF 


R5, *AR0,R1 


• Rl = X (I) + X (12) 


1 1 


STF 


R1,*+AR1 


• Y(I1) = Rl - R3 




ADDt 


"DO nl "o C 

Ko,Kx,Kb t 


• R6 = Rl + R3 




SUBF 


rj C St A T5 A T> A 

KO, *ARU, K4 / 


• R4 = X(I) - X(I2) 


1 1 

1 1 


STF 


Ro, *AR0++ (IR0) , 


' X ( I ) = Rl + R3 




SUBF 


R3,R1 


• Rl = Rl ~ R3 




SUBF 


*AR3,*AR1,R6 


• R6 = X(I1) - X(I3) 




SUBF 


R0,*+AR3,R3 


; -R3 = Y(I1) - Y(I3) 


1 1 


STF 


R1,*AR1++(IR0) 


; X(I1) = R1-R3 




SUBF 


R6,R4,R5 


? R5 = R4 - R6 




ADDF 


R6,R4 


• R2 = R4 + R6 




STF 


R5,*+AR2 


; Y(I2) = R4 - R6 


1 1 


STF 


R2,*+AR3 


? Y(I3) = R4 + R6 




SUBF 


R3,R2,R5 


r R5 = R2 - R3 




ADDF 


R3,R2 


? R2 = R2 + R3 




STF 


R2,*AR3++(IR0) 


; X(I3) = R2 + R3 


BLK1 


STF 


R5,*AR2++(IR0) 


? X(I2) = R2 - R3 


1 1 


LDF 


*+ARl,R0 


? RO = Y(I1) 


* 


IF THIS 


IS THE LAST STAGE, 


YOU ARE DONE 




CMP I 


IR1,R8 




BZD 


END 







* MAIN INNER LOOP 

LDI 1,R10 ; Init IA1 index 

LDI 2,R11 ; Init loop counter for inner loop 

LDI R11,AR0 

ADD I ©INPUT, ARO ; (X(I),Y(I)) pointer 

ADDI 2,R11 ; Increment inner loop counter 
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INLOP: ADD I R10,AR7 

ADD I BK, ARO, AR1 

CMPI R9,R11 

BZD SPCL 

ADD I BK,AR1,AR2 

ADD I BK, AR2, AR3 

SUBI3 1,R8,RC 

LDI R10,AR4 

ADD I @SINTAB, AR4 

ADD I AR4,R10,AR5 

SUB I 1,AR5 

RPTBD BLK2 

ADD I 10,AR5,AR6 

SUBI 1,AR6 

LDF *+AR2,R7 

* SECOND LOOP: BLK2 

ADDF R7,*+AR0,R3 

ADDF * +AR3 , * +AR1 , R5 

ADDF R5,R3,R6 

SUBF R7,*+AR0,R4 

SUBF R5,R3 

ADDF *AR2,*AR0,R1 

ADDF *AR3,*AR1,R5 

MPYF R3,*+AR5(IR1) , R6 

I | STF R6,*+AR0 

ADDF R5,R1,R0 

SUBF *AR2,*AR0,R2 

SUBF R5 , Rl 

MPYF R1,*AR5,R0 

I | STF R0,*AR0++(IR0) 

SUBF R0,R6 

SUBF *+AR3 , *+ARl , R5 

MPYF Rl , *+AR5 ( IR1 ) , RO 

I | STF R6, *+ARl 

MPYF R3,*AR5,R6 

ADDF R0,R6 

ADDF R5,R2,R1 

SUBF R5,R2 

SUBF *AR3,*AR1,R5 

SUBF R5,R4,R3 

ADDF R5,R4 

MPYF R3,*+AR4 (IR1) , R6 

I | STF R6,*AR1++(IR0) 

MPYF R1,*AR4,R0 

SUBF R0,R6 

MPYF Rl, *+AR4 (IR1) , R6 

I | STF R6,*+AR2 

MPYF R3,*AR4,R0 

ADDF R0,R6 

MPYF R4 , *+AR6 ( IR1 ) , R6 



IA1 = IA1 + IE 
(X(I1) , Y(I1) ) pointer 
If LPCNT - JT, go to 
special butterfly 
(X(I2) ,.Y(I2) ) pointer 
(X(I3) ,Y(I3) ) pointer 
RC should be one less than 
desired # 

; Create cosine index AR4 

; IA2 = IA1 + IA1 - 1 
; Setup loop BLK2 

/ IA3 - IA2 + IA1 - 1 
; R7 = Y(I2) 



R3 - Y(I) + Y(I2) 

R5 = Y(I1) + Y(I3) 

R6 = R3 + R5 

R4 = Y(I) - Y(I2) 

R3 = R3 - R5 

Rl = X(I) + X(I2) 

R5 = X(I1) + X(I3) 

R6 = R3*C02 

Y(I) = R3 + R5 

RO = Rl + R5 

R2 = X(I) - X(I2) 

Rl = Rl - R5 

RO = R1*SI2 

X(I) = Rl + R5 

R6 = R3*C02 - R1*SI2 

R5 - Y(I1) - Y(I3) 

RO = R1*C02 

Y(I1) = R3*C02 - Rl*SI2 

R6 = R3*SI2 

R6 = R1*C02 + R3*SI2 

Rl = R2 + R5 

R2 = R2 - R5 

R5 « X(I1) - X(I3) 

R3 = R4 - R5 

R4 = R4 + R5 

R6 = R3*C01 

X(I1) = R1*C02 + R3*SI2 
RO = R1*SI1 
R6 - R3*C01 - R1*SI1 
R6 = R1*C01 

Y(I2) = R3*C01 - R1*SI1 
RO = R3*SI1 
R6 = R1*C01 + R3*SI1 
R6 - R4*C03 
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I I 



BLK2 
I I 



STF R6, *AR2++(IR0) 

MPYF R2,*AR6,R0 

SUBF R0,R6 

MPYF R2 , *+AR6 ( IR1 ) , R6 

STF R6,*+AR3 

MPYF R4,*AR6,R0 

ADDF R0,R6 

STF R6, *AR3++ (IRO) 

LDF *+AR2,R7 

CMPI R11,BK 

BPD INLOP 

LDI R11,AR0 

ADD I @ INPUT, ARO 

ADD I 2,R11 

BRD CONT 

LSH 2,R8 

LSH 2,AR7 

LDI BK, IRO 



X(I2) - R1*C01 + R3*SI1 
RO = R2*SI3 
R6 = R4*C03 - R2*SI3 
R6 - R2*C03 

Y(I3) = R4*C03 - R2*SI3 
RO - R4*SI3 
R6 = R2*C03 + R4*SI3 
x(i3) = R2*C03 + R4*SI3 
Load next Y(I2) 

LOOP BACK TO THE INNER LOOP 

(X(I),Y(I)) pointer 
Increment inner loop counter 

Increment repeat counter for 
next time 
IE - 4*IE 
Nl - N2 



* SPECIAL BUTTERFLY FOR W=J 

SPCL RPTBD BLK3 ; Setup loop BLK3 

LSH -1,IR1,AR4 ; Point to SIN (45) 

ADD I @SINTAB, AR4 ; Create cosine index AR4 = C021 



LDF 


*AR2,R7 ; R7 




X(I2) 


*SPCL LOOP: BLK3 










ADDF 


R7,*AR0,R1 


• Rl 




X(I) + X(I2) 


ADDF 


*+AR2,*+AR0,R3 


• R3 




Y(I) + Y(I2) 


SUBF 


*+AR2, *+AR0,R4 


• R4 




Y(I) - Y(I2) 


ADDF 


*AR3,*AR1,R5 


• R5 




X(I1) + X(I3) 


SUBF 


R1,R5,R6 


• R6 




R5 - Rl 


ADDF 


R5, Rl 


■ Rl 




Rl + R5 


ADDF 


*+AR3, *+ARl,R5 


• R5 




Y(I1) + Y(I3) 


SUBF 


R5,R3,R0 


• RO 




R3 - R5 


ADDF 


R5,R3 


• R3 




R3 + R5 


SUBF 


R7,*AR0,R2 


• R2 




X(I) - X(I2) 


I | STF 


R3,*+AR0 


• Y(I) 


= R3 + R5 


LDF 


*AR3,R7 


• R7 




X(I3) 


I | STF 


R1,*AR0++(IR0) 


' X(I) 


= Rl + R5 


SUBF 


R7,*AR1,R1 


• Rl 




X(I1) - X(I3) 


I | STF 


R6,*+AR1 


' Y(I1) 


= R5 - Rl 


SUBF 


*+AR3, *+ARl,R3 


• R3 




Y(I1) - Y(I3) 


ADDF 


R3,R2,R5 


• R5 




R2 + R3 


SUBF 


R2,R3,R2 


• R2 




-R2 + R3 


SUBF 


Rl / R4 / R3 


• R3 




R4 - Rl 


ADDF 


R1,R4 


■ R4 




R4 + Rl 


SUBF 


R5,R3,R1 


• Rl 




R3 - R5 


MPYF 


R1,*AR4,R1 


• Rl 




R1*C021 


I | STF 


R0,*AR1++(IR0) 


• X(I1) 


= R3 - R5 


ADD 


R5,R3 


• R3 




R3 + R5 


MPYF 


R3,*AR4,R3 


• R3 




R3*C021 


I | STF 


R1,*+AR2 


• Y(I2) 


= (R3 - R5)*C021 


SUBF 


R4,R2,R1 


• Rl 




R2 - R4 
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MPYF 


Rl . *AR4 . Rl 


• Rl = R1*C021 


1 1 


STF 


R3, *AR2++ (IRO) , 


• X(I2) = (R3 + R5)*C021 




ADDF 


R4 , R2 


• R2 = R2 + R4 




MPYF 3 


R2,*AR4,R2 


• R2 = R2*C021 


1 1 


STF 


R1,*+AR3 


• Y(I3) = -(R4 - R2)*C021 


BLK3 


LDF 


*AR2 , R 


• Load next X(I2) 


I | 


STF 


R2,*AR3++(IR0) 


• X(I3) = (R4 + R2) *C021 




CMP I 


R11,BK 






BPD 


INLOP ; Loop back to the inner loop 




LDI 


R11,AR0 






ADD I 


@ INPUT, ARO 


• (X(I),Y(I)) pointer 




ADD I 


2,R11 


• Increment inner loop counter 


LSH 


2, R8 




• Increment repeat counter for 








• t ime 




LSH 


2, AR 


• IE = 4*IE 




LDI 


BK, IRO 


• Nl = N2 


CONT 


BRD 


LOOP 


• Next FFT stage (delayed) 




LSH 


-2,BK 


• N2 = N2/4 




LSH3 


-1,BK,R9 






ADD I 


2,R9 / JT = N2/2 + 2 




STORE 


RESULT OUT USING BIT- 


REVERSED ADDRESSING 


END: 


LDI 


@FFTSIZ,IRO / IRO = size of FFT = N 




SUBI3 


2,IR0,RC ; RC = N - 2 




LDI 


2, IR1 






RPTBD 


BITRV 






LDI 


@ INPUT, ARO 






LDI 


@OUTPUT,ARl 






LDF 


*+AR0 (1) ,R0 






BIT REVERSE LOOP 






LDF 


*AR0++(IR0)B,R1 




| | 


STF 


RO, *+ARl (1) 




BITRV 


LDF 


*+AR0 (1) ,R0 




1 1 


STF 


R1,*AR1++(IR1) 






LDF 


*AR0++(IR0)B,R1 




1 1 


STF 


RO, *+ARl (1) 






STF 


Rl, *AR1++ (IR1) 




SELF 


BR 


SELF ; Branch to itself at the end. 




.end 







Most often, the data to be transformed is a sequence of real numbers. In this 
case, the FFT demonstrates certain symmetries that permit the reduction 
of the computational load even further. Example 1 2-40 shows the generic 
implementation of a real-valued, radix-2 FFT. For such an FFT, the total 
storage required for a length-N transform is only N locations; in a complex 
FFT, 2N are necessary. Recovery of the rest of the points is based on the 
symmetry conditions. 
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Example 12-40. Real, Radix-2 FFT 
* 

* TITLE REAL, RADIX-2 FFT 
★ 

* GENERIC PROGRAM TO DO A RADIX-2 REAL FFT COMPUTATION 

* IN 320C40. 

* THE PROGRAM IS DERIVED FROM THE PAPER BY SORENSEN ET AL., 

* JUNE 1987 ISSUE OF THE TRANSACTIONS ON ASSP. 

* THE REAL DATA RESIDE IN INTERNAL MEMORY. THE COMPUTATION IS 

* DONE IN-PLACE. THE BIT-REVERSAL IS DONE AT THE BEGINNING OF 

* THE PROGRAM. 
* 

* THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE PUT IN A .DATA 

* SECTION. THIS DATA IS INCLUDED IN A SEPARATE FILE TO PRESERVE 

* THE GENERIC NATURE OF THE PROGRAM. FOR THE SAME PURPOSE, THE 

* SIZE OF THE FFT N AND LOG2 (N) ARE DEFINED IN A . GLOBL 





DIRECTIVE 


AND SPECIFIED 


DURING LINKING. THE LENGTH OF 


* 


THE TABLE 


IS N/4 + N/4 = 


N/2. 


* 


.globl 


FFT 




Entry point for execution 




.globl 


N 




FFT size 




.globl 


M 


/ 


• LOG2 (N) 




. globl 


SINE 


* 


• Address of sine table 




.bss 


INP, 1024 


; Memory with input data 




.text 








* 


INITIALIZE 






FFTSIZ 


.word 


N 






LOGFFT 


.word 


M 






SINTAB 


.word 


SINE 






INPUT 


.word 


INP 






FFT: 


LDP 


FFTSIZ 


; Command to load data page printer 




DO THE BIT-REVERSING AT 


THE 


BEGINNING 




LDI 


@FFTSIZ,R8 




' R8 = N 




SUBI 


1,R8, RC 




• RC should be one less 










• than desired # 




LDI 


@SINTAB,R9 








RPTBD 


BITRV 




• Setup for BITRV loop 




LSH3 


-1,R8, IRO 




• IR1 = half the size of FFT = N/2 




LDI 


@ INPUT, ARO 




• ARO points to X(I) 




LDI 


@ INPUT, AR1 




• AR1 points to X(I) 
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* DIGIT REVERSE COUNTER 

CMP I AR1 , ARO 

LDF *AR0++(1) ,R1 

|| LDF *AR1++(IR0)B / R0 

LDFLT *AR0 / R0 

LDFLT *AR1,R1 

BITRV STF R0,*AR1 

I | STF Rl, *AR0 



Exchange locations only 
if AR0<AR1 



LENGTH— TWO BUTTERFLIES 



BLK1 
I I 
* 



I I 

BLK2 
I I 



LOOP 



LDI 

RPTBD 

SUBI3 

LDF 

LDI 

BLK1 LOOP 

ADDF 

SUBF 

STF 

LDF 

STF 



@ INPUT, ARO 
BLK1 

1,IR0,RC 
*+AR0(l) ,R2 
2,IR0 

R2,*AR0,R0 
R2, *AR0,R1 
R0,*AR0++(1) 
*+AR0(IR0) ,R2 
R1,*AR0++(1) 



FIRST PASS OF THE DO-20 LOOP 
LDI @ INPUT, ARO 

RPTBD BLK2 
LSH3 -2,R8,RC 
SUBI 1,RC 



LDF 

BLK2 LOOP 

ADDF 

SUBF 

STF 

NEGF 

STF 

LDF 

STF 



*+AR0(IR0) ,R2 

R2,*AR0++(IR0) ,R0 

R2,*-AR0(IR0) ,R1 

R0,*-AR0(IR0) 

*+AR0,R0 

R1,*AR0++(IR0) 

*+AR0<IR0) ,R2 

R0,*-AR0 



MAIN LOOP (FFT STAGES) 
LSH3 -3,R8,IR0 
LDI 3,R11 



LDI 
LDI 
LDI 
LSH3 
ADD 1 3 



2,R4 
4,R3 

@ INPUT, AR5 

2,R4,R10 

IR0,R9,AR0 



ARO points to X(I) 
Setup for BLK1 loop 
RC = (N/2) -1 
R2 = X(I + 1) 
IRO = 2 = N2 

RO - X(I) + X(I + 1) 

Rl - X(I) - X(I + 1) 

X(I) = X(I) + X(I + 1) 
Load next X(I) 

X(I + 1) = X(I) - X(I + 1) 

(STAGE K = 2 IN DO-10 LOOP) 
ARO points to X(I) 
Setup for BLK2 loop 
Repeat N/4 times 
RC should be one less 
than desired # 
R2 = X(I + 2) 



RO = X(I) + 
Rl = X(I) - 
X(I) = X(I) 
RO = -X(I + 



X(I + 
X(I + 
+ X(I 
3) 



2) 
2) 

+ 2) 



X(I + 2) = X(I) - X(I + 2) 
Load next X(I + 2) 
X(I + 3) = -X(I + 3) 

IRO = E/2 index for E 
Rll holds the current 
stage number 
R4 = N4 
R3 = N2 

AR5 points to X(I) 
Set loop counter 
ARO points to SIN/COS 
table 
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INNER LOOP (DO-20 LOOP IN THE PROGRAM) 



INLOP 


LDI 


R4 , IR1 ; 


TR1 = N4 or N2/2 




ADD 1 3 


1 ARS AR1 


AR1 noints to XflU = X(I + • 




r\uu x o 


R^ AR1 AR^ 


IRQ r^<^"in , t"Q to 




SUBI3 


2 AR3 AR2 


AR? nnin1*«! t" O 

X(I2) = X(I - J + N2) 




ADD I 


R3 . AR2 . AR4 


AR4 Doints to 

X(I4) = X(I - J + Nl) 




LDF 


*AR5++ (IR1) , RO t 


• RO = X ( I ) 




ADDF 


*+AR5 (IR1) , RO, Rl , 


Rl = X(I) + X(I + N2) 




SUBF 


RO, *++AR5 (IR1) , RO , 


• RO = -X(I) + X(I + N2) 


1 1 


STF 


Rl, *-AR5 (IR1) 


• X(I) = X(I) + X(I + N2) 




NEGF 


RO 


• RO = X(I) - X(I + N2) 




NEGF 


*++AR5 (IR1) /Rl 


• Rl = -X(I + N4 + N2) 


1 1 

i i 


STF 


RO . *AR5 


1 X(I + N2) = X(I) - X(I + N2) 




STF 


Rl , *AR5 , 


' X(I + N4 + N2) = -X(I + N4 + 




INNERMOST 


LOOP 






RPTBD 


BLK3 , 


• Setup for BLK3 loop 




LSH3 


-2,R8, / 


• IRl=separat ion between 

• SIN/COS tbls 




SUBI 


2 , R4 , RC , 


• Repeat N4 - 1 times 




LDF 


* AR3 , R5 , 


• R5 =» X(I3) 


* 


BLK3 LOOP 








MPYF 


R5, *+AR0 (IR1) , RO t 


; RO = X (13) *COS 




MPYF 


* AR4 , *AR0 , Rl 


• Rl = X(I4) *SIN 




MPYF 


*AR4, *+AR0 (IR1) , Rl 


; Rl = X(I4) *COS 


| | 


ADDF 


R0,R1,R2 


? R2 = X(I3)*COS + X(I4)*SIN 




MPYF 


R5 , *AR0++ ( IRO ) , RO 


; RO = X(I3) *SIN 




SUBF 


RO, Rl, RO 


? RO = -X(I3)*SIN + X(I4)*COS 




SUBF 


*AR2,R0,R1 


; Rl = -X(I2) + RO 




ADDF 


*AR2 , RO , Rl 


; Rl = X(I2) + RO 


| | 


STF 


Rl, *AR3++ 


? X(I3) = -X(I2) + RO 




ADDF 


* AR1 , R2 , Rl 


; Rl = X(I1) + R2 


| | 


STF 


Rl, *AR4 — 


; X(I4) = X(I2) + RO 




SUBF 


R2 , * AR1 , Rl 


; Rl = X(I1) - R2 


1 1 


STF 


Rl, *AR1++ 


? X(I1) = X(I1) + R2 


BLK3 


LDF 


*AR3,R5 


; Load next X(I3) 


1 | 
i i 


STF 


Rl, *AR2 — 


; X(I2) = X(I1) - R2 




CMP I 


@FFTSIZ, RIO 






BLTAF 


INLOP 


? Loop back inner to theloop 




ADD I 


R4 , AR5 


; AR5 = I + Nl 




ADD I 


RIO, RIO 






ADD 1 3 


IRO, R9, ARO 


; ARO points to 
; SIN/COS table 




ADD I 


1,R11 






CMP I 


0LOGFFT, Rll 






BLEAF 


LOOP 






LSH 


-1, IRO 


; E = E/2 




LSH 


1,R4 


; N4 = 2*N4 




LSH 


1,R3 


; N2 = 2*N2 


END 


BR 


END 


; Branch to itself at the end. 



. end 
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Example 12-37, Example 12-39, and Example 12-40 provide an easy un- 
derstanding of the FFT algorithm functions. However, they are not optimized 
for fast speed execution of FFT. Example 12—41 shows a faster version of 
a radix-2 DIT FFT algorithm. This program uses a different twiddle factors 
table than the previous examples. The twiddle factors are stored in bit re- 
versed order and with a table length of N/2 (N = FFT length). For instance, 
if the FFT length is 32, the twiddle factors table should be: 

Address Coefficient 

0 R{WN(0)} = COS(2*PI*0/32) . 1 

1 -l{WN(0)} = SIN(2*PI*0/32) = 0 

2 R{WN(4)} = COS(2*PI*4/32) = 0.707 

3 -L{WN(4)} = SIN(2*PI*4/32) = 0.707 



12 R{WN(3)} = COS(2*PI*3/32) - 0.831 

13 -l{WN(3)} = SIN(2*PI*3/32) . 0.556 

14 R{WN(7)} = COS(2*PI*7/32) = 0.195 

15 -l{WN(7)} = SIN(2*PI*7/32) = 0.981 



Example 12-41. Faster Version Complex, Radix-2 DIT FFT 

* TITLE FASTER VERSION COMPLEX, RADIX-2 DIT FFT 

* GENERIC PROGRAM FOR A FAST LOOPED-CODE RADIX-2 DIT FFT 

* COMPUTATION IN TMS320C40 

* THE PROGRAM IS DERIVED FROM THE PAPER BY RAIMUND MEYER AND 

* AND KARL SCHWARZ, VOLUME 3, PROCEEDINGS OF ICASSP 90. 

* THE (COMPLEX) DATA RESIDE IN INTERNAL MEMORY. THE COMPUTATION 

* IS DONE IN-PLACE, BUT THE RESULT IS MOVED TO ANOTHER MEMORY 

* SECTION TO DEMONSTRATE THE BIT-REVERSED ADDRESSING. 

* FOR THIS PROGRAM THE MINIMUM FFT LENGTH IS 32 POINTS BECAUSE 

* OF THE SEPARATE STAGES. FIRST TWO PASSES ARE REALIZED AS A 

* FOUR BUTTERFLY LOOP SINCE THE MULTIPLIES ARE TRIVIAL. THE 

* MULTIPLIER IS ONLY USED FOR A LOAD IN PARALLEL WITH AN ADDF 

* OR SUBF. 
* 

* THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE PUT IN A .DATA 

* SECTION. THIS DATA IS INCLUDED IN A SEPARATE FILE TO PRESERVE 

* THE GENERIC NATURE OF THE PROGRAM. FOR THE SAME PURPOSE, THE 

* SIZE OF THE FFT N AND LOG2 (N) ARE DEFINED IN A . GLOBL 

* DIRECTIVE AND SPECIFIED DURING LINKING. THE LENGTH OF 

* THE TABLE IS N/2. 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 



. global 
. global 
. global 
.global 
.global 
.global 
.global 
.BSS 



fft 
n 

nhalb 
nviert 
nachtel 
m 

sine 

inp / 2048 



input vector length = 2n 













(depends of n)n) 




.BSS 






OUlP/ ZU4o / 


output vector length = 2n 












(depends of n)n) 




.text 








r rtsiz 


.word 




n 




rg*4iuz 


.word 




n v ieii £. 




f g4m3 


.word 




nviert .3 




fg8m2 


.word 




nachtel-2 




fg2 


.word 




nhalb 




fg2m3 


.word 




nhalb-3 




logfft 


.word 




m 




sintab 


.word 




sine 




sintml 


.word 




sine-1 




sintp2 


• word 




sine+2 




input 


.word 




inp 




inputp2 


.word 




inp+2 




output 


.word 




outp 




* 


arO , 




AR 


+ AI 






arl 




BR 


+ BI 






ar2 • 




CR 


+ CI + CR' + CI' 






ar3 




DR 


+ DI 




* 


ar4 




AR' 


+ AI' 




* 


ar5 




BR' 


+ BI' 




* 


ar6 




DR' 


+ DI' 




* 


ar7 




first twiddle factor 


- 1 


fft : 


ldp 






fftsiz 


• load page pointer 




ldi 






@fg2,ir0 


• irO = n/2 = offset between inputs 




ldi 






@sintab,ar7 , 


■ ar7 points to twiddle factor 1 




ldi 






@ input ,arO t 


• arO points to AR 




addi 






irO,arO,arl , 


• arl points to BR 




addi 






ir0,arl,ar2 , 


• ar2 points to CR 




addi 






ir0,ar2,ar3 , 


• ar3 points to DR 




ldi 






ar0,ar4 , 


• ar4 points to AR' 




ldi 






arl,ar5 t 


• ar5 points to BR' 




ldi 






ar3,ar6 , 


• ar6 points to DR' 




ldi 






2,irl 


• address offset 




lsh 






-l,irO 


• irO = n/4 = number of 












• R4-butterf lies 




subi 






2, irO, rc 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 
************************************** 

* FIRST 2 STAGES AS RAD IX- 4 BUTTERFLY 

*********************************************************************** 

* fill pipeline 



addf *ar2,*ar0,r4 ; r4 = AR + CR 

subf *ar2, *ar0++,r5 ; r5 - AR - CR 

addf *arl, *ax3, r6 ; r6 ■ DR + BR 

subf *arl++,*ar3++,r7 ; r7 = DR - BR 

addf r6,r4,r0 ; AR' = rO = r4 + r6 

mpyf *ar3++,*ar7,rl ; rl = DI , BR' = r3 = r4 - r6 

I I subf r6, r4, r3 

addf rl,*arl,rO ; rO = BI + DI , AR' = rO 

I I stf rO, *ar4++ 

subf rl,*arl++,rl ; rl = BI - DI , BR' = r3 

I I stf r3, *ar5++ 

addf rl,r5,r2 ; CR' = r2 = r5 + rl 

mpy *+ar2,*ar7,rl ; rl - CI , DR' - r3 - r5 - rl 

I I subf rl, r5, r3 

rptbd blkl ; Setup for radix-4 butterfly loop 

add rl,*ar0,r2 ; r2 = AI + CI , CR' = r2 

I I stf r2, *ar2++ (irl) 

subf rl,*ar0++,r6 ; r6 = AI - CI , DR' = r3 

I I stf r3, *ar6++ 

addf r0,r2, r4 ; AI' = r4 = r2 + rO 

* radix-4 butterfly loop 

mpyf *ar2— ,*ar7,r0 ; rO = CR , (BI' = r2 = r2 - rO) 

I I subf rO, r2, r2 

mpyf *arl++, *ar7, rl ; rl - BR , (CI' = r3 = r6 + r7) 

I I addf r7, r6 f r3 

addf r0,*ar0,r4 ; r4 = AR + CR , (AI' = r4) 

I | stf r4 f *ar4++ 

subf r0 / *ar0++ / r5 ; r5 - AR - CR , (BI' = r2) 

I I stf r2, *ar5++ 

subf r7 / r6 / r7 ; (DI' = r7 = r6 - r7) 

addf rl # *ar3 r r6 ; r6 = DR + BR , (DI' = r7) 

I | stf r7, *ar6++ 

subf rl,*ar3++,r7 ; r7 = DR - BR , (CI' = r3) 

I I stf r3, *ar2++ 

addf r6 # r4,r0 ; AR' = rO = r4 + r6 

mpyf *ar3++, *ar7, rl ; rl * DI , BR' = r3 = r4 - r6 

I | subf r6, r4, r3 

addf rl # *arl # rO ; rO - BI + DI , AR' - rO 

I I stf rO, *ar4++ 

subf rl, *arl++, rl ; rl - BI - DI , BR' - r3 

I | stf r3, *ar5++ 

addf rl,r5,r2 ; CR' - r2 - r5 + rl 

mpyf *+ar2,*ar7,rl ; rl = CI , DR' = r3 = r5 - rl 

I | subf rl , r5 , r3 

addf rl,*ar0,r2 ; r2 = AI + CI , CR' = r2 

I | stf r2, *ar2++ (irl) 

subf rl,*ar0++,r6 ; r6 = AI - CI , DR' = r3 

I I stf r3, *ar6++ 

blkl addf r0,r2 f r4 / AI' = r4 = r2 + rO 
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* clear pipeline 



I I 



I I 

***********************^ 

* THIRD TO LAST-2 STAGE * 

***********************^ 



subf 


rO, r2, r2 




BI' 




r2 


= r2 - rO 


addf 


rl, r6, r3 


/ 


CI' 




r3 


- r6 + r7 


St 


r4, *ar4 


! 


AI' 




r4 


, BI' • r2 


stf 


r2, *ar5 












subf 


rl, r6, rl 




DI' 




r7 


= r6 - r7 


stf 


r7,*ar6 




DI' 




r7 


, CI' = r3 


stf 


r3,* — ar2 













stuf e 



I I 

gruppe 

* fill pipeline 



ldi 


@fg2,irl 






subi 


1, irO, ar5 






ldi 


1, are 






ldi 


@sintab,ar7 , 


• pointer to twiddle factor 


ldi 


0,ar4 


• group counter 




ldi 


@input,arO , 


• upper real butterfly 


input 


ldi 


ar0,ar2 


• upper real butterfly 


output 


addi 


ir0,ar0,ar3 t 


? lower real butterfly 


output 


ldi 


ar3,arl , 


r lower real butterfly 


input 


lsh 


l,ar6 


; double group count 




lsh 


-2 / ar5 


• half butterfly count 




lsh 


l,ar5 t 


; clear LSB 




lsh 


-l.irO 


; half step from upper 


to 






r lower real part 




lsh 


-l # irl 






addi 


l.irl 


; step from old imaginary to new 






? real value 




ldf 


*arl++,r6 


; dummy load, only for 


address 






• update 




Id ar7,r7 


; r7 - COS 







arO = upper real butterfly input 
arl = lower real butterfly input 
ar2 = upper real butterfly output 
ar3 = lower real t butterfly output 
the imaginary part has to follow 



I I 



I I 
I I 



ldf 


*++ar7, r6 


/ 


r6 - SIN 






mpyf 


*arl — , re, rl 


/ 


rl = BI * 


SIN 




addf 


*++ar4, rO, r3 


/ 


dummy addf for 


counter update 


mpyf 


*arl, r7, rO 


/ 


rO = BR * 


COS 




mpyf 


*arl++, *ar7 — , rO 




r3 = TR = 


rO + 


rl , rO = BR * 


addf 


rO, rl, r3 










rptbd 


bflyl 




Setup for 


loop 


bflyl 


spyf 


*arl++, r7, rl 


# 


rl - BI * 


COS , 


r2 = AR - TR 


subf 


r3, *arO, r2 










add 


*arO++, r3, r5 




r5 - AR + 


TR , 


BR' = r2 


stf 


r2, *ar3++ 










ldi 


ar5, rc 











SIN 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 



FIRST BUTTERFLY-TYPE: 





TR - 


BR 


* COS + BI * SIN 














TI = 


BR 


* SIN - BI * COS 












* 


AR' = 


AR 


+ TR 














AI' = 


AI 


- TI 












* 


BR' .- 


AR 


- TR 














BI' - 


AI 


+ TI 












* 


loop bflyl 














mpy 




*+arl / r6, r5 


/ 


r5 = 


BI 


* SIN , (AR' 


= r5) 


I I 


stf 




r5, *ar2++ 














subf 




rl, rO, r2 




(r2 = 


= TI 


= rO - rl) 






mpyf 




*arl, r7, rO 


; 


rO = 
(r3 = 


BR 
= AI 


* COS , 
+ TI) 




I I 


addf 




r2, *arO, r3 














subf 




r2, *arO++, r4 




(r4 = 


= AI 


- TI , BI' 


= r3) 


I I 


stf 




r3, *ar3++ 














aaar 




rO, ro, r3 




r3 = 


TR 


= rO + r5 






mpyf 




*arl++, r6, rO 




rO = 
r2 = 


BR 
AR 


* SIN ,. 
- TR 




I I 


subf 




r3,*ar0,r2 














mpyf 




X an++ / r / , ri 




rl = 


BI 


* COS , (AI' 


= r4) 


I I 


stf 




r4, *ar2++ 












of lyl 
i i 
1 1 


addf 
stf 




*ar0++,r3,r5 
rz, *ar3++ 




r5 = 


AR 


+ TR , BR' = 


r2 


* switch 


over 


to 


next group 














subf 




rl, rO, r2 


/ 


r2 = 


TI 


= rO - rl 






addf 




IZ / A alU , ij 


/ 


r3 = 


AI 


+ TI , AR' = 


r5 


i i 
1 1 


stf 




r5, *ar2++ 














subf 




r2, *arO++ (irl) , r4 


/ 


r4 = 


AI 


- TI , BI' = 


r3 


1 1 


stf 




r3, *ar3++ (irl) 














nop 




*arl++ (irl) 


# 


address 


update 






mpyf 




*arl — , r7, rl 




rl = 


BI 


* COS , AI' 


= r4 


1 1 


stf 




r4, *ar2++ (irl) 














mpyf 




*arl, r6, rO 


/ 


rO = 


BR 


* SIN 






mpyf 




*arl++, *ar7++, rO 


/ 


r3 - 


TR 


= rl - rO , 


rO = BR 


* COS 

1 1 


subf 




rO, rl, r3 














rptbd 


bfly2 




Setup for loop bfly2 






mpyf 




*arl++, r6, rl 


/ 
/ 


rl = 
r2 = 


BI 
AR 


* SIN , 
- TR 




1 1 


subf 




r3, *arO, r2 














addf 




*ar0++,r3,r5 




r5 = 


AR 


+ TR , BR' = 


r2 


1 1 


stf 
ldi 




r2,*ar3++ 
ar5, rc 
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* SECOND BUTTERFLY-TYPE: 

BI * COS - BR * SIN 
BI * SIN + BR * COS 
AR + TR 
AI - TI 
AR - TR 
AI + TI 

loop bfly2 





mpyf 


A +ari , r / , ro 




ro — til ^ LUo , iak — ro; 


1 1 


stf 


ro / ~a.r^xT 








addf 


r*1 t~ D y~ 9 




V X "— XX — X U ' X. X / 




mpyf 


an, ro, r u 




ru — dk o x in , 

V X. O — rlX i llj 


1 1 


addf 


X ^ , CiJL\J f L. o 








sub 


X. , Ql U T T , J. 4 ! 




\ X 1 /A. X XX , XjX X. *j ) 


1 1 


stf 


X «J , Ql JIT 








subf 


rO, r5, r3 




TR = r 3 = r 5 — rO 




mpyf 


*arl++, r7, rO 


r 


rO = BR * COS . r2 = AR - TR 


1 1 


subf 


r3, *arO, r2 








mpyf 


*arl++, r6, rl 


/ 


rl as RT * ^TW fAT' = r4^ 

XX — XJ X OX IN , \ t\ X X ~ / 


1 1 


stf 


r4, *ar2++ 






bfly2 
I I 


addf 
stf 


*arO++, r3, r5 
r2, *ar3++ 


# 


r5 = AR + TR , BR' = r2 


* clear pipeline 










addf 


rl, rO, r2 


} 


r2 = TI = rO + rl 




addf 


r2, *ar0,r3 


/ 


r3 = AI + TI 


I I 


stf 


r5, *ar2++ 


/ 


AR' = r5 




cmpi 


ar6, ar4 








bned 


gruppe 


/ 


do following 3 instructions 




subf 


r2, *arO++ (irl) , r4 


r 


r4 = AI - TI , BI' = r3 


I I 


stf 


r3, *ar3++ (irl) 








ldf 


*++ar7, rl 


i 


r7 = COS 


I I 


stf 


r4, *ar2++ (irl) 


/ 


AI' = r4 




nop 


*arl++ (irl) 


/ 


branch here 


* end 


of this butterfly group 








cmpi 


4, irO 


r 


jump out after ld(n)-3 stage 




bnzaf 


stuf e 








ldi 


@sintab,ar7 


r 


pointer to twiddle factor 




ldi 


0,ar4 

@ input, arO 


t 


group counter 




ldi 


t 


upper real butterfly input 



★ SECOND LAST STAGE -* 

************************************^ 



ldi 


©input, arO 


; upper input 


ldi 


ar0,ar2 


; upper output 


addi 


irO, arO, arl 


; lower input 


ldi 


arl,ar3 


; lower output 


ldi 


@sintp2, ar7 


; pointer to twiddle factor 


ldi 


5, irO 


; distance between two groups 


ldi 


@fg8m2, rc 





* fill pipeline 



TR 

TI 

AR' 

AI' 

BR' 

BI' 
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Example 12-41. Faster Version Complex, Radix-2 DITFFT (Continued) 

* 5. to M. butterfly: 

* loop bf2end 



Id *ar7++,r7 / r7 = COS , ( (Al' - r4) ) 

stf r4,*ar2++ 

ldf *ar7++,r6 ; r6 = SIN , (BR' = r2) 

stf r2,*ar3++ 

mpyf *+arl,r6,r5 ; r5 = BI * SIN , (AR' = r3) 

stf r3,*ar2++ 

addf rl / r0 / r2 ; (r2 - TI - rO + rl) 

mpyf *arl,r7,r0 / rO = BR * COS , 

; (r3 - AI + TI) 

addf r2,*ar0,r3 

sub r2, *arO++ (irO) , r4 ; (r4 = AI - TI , BI' = r3) 

stf r3,*ar3++(ir0) 

addf r0,r5,r3 ; r3 = TR = rO + r5 

mpyf *arl++,r6,r0 ; rO = BR * SIN , r2 = AR - TR 

subf r3,*ar0,r2 

mpyf *arl++,r7,rl ; rl = BI * COS , (AI' = r4) 

stf r4, *ar2++ (irO) 

addf *ar0++,r3,r5 ; r5 = AR + TR , BR' = r2 

stf r2,*ar3++ 

mpyf *+arl,r6,r5 ; r5 = BI * SIN , (AR' « r5) 

stf r5,*ar2++ 

subf rl,r0,r2 ; (r2 = TI = rO - rl) 

mpyf *arl,r7 / r0 ; rO = BR * COS , 

; (r3 = AI + TI) 

addf r2,*ar0,r3 

subf r2,*ar0++,r4 ; (r4 = AI - TI , BI' - r3) 

stf r3,*ar3++ 

addf r0,r5,r3 ; r3 = TR = rO + r5 

mpyf *arl++,r6,r0 ; rO - BR * SIN , r2 = AR - TR 

subf r3,*ar0,r2 

mpyf *arl++(irO) ,r7,rl ; rl = BI * COS , (AI' = r4) 

stf r4,*ar2++ 

addf *ar0++,r3,r3 ; r3 = AR + TR , BR' = r2 

stf r2,*ar3++ 

mpyf *+arl,r7,r5 ; r5 = BI * COS , (AR' = r3) 

stf r3,*ar2++ 

subf rl,r0,r2 ; (r2 = TI = rO - rl) 

mpyf *arl,r6,r0 ; rO = BR * SIN , 

; (r3 = AI + TI) 

addf r2,*ar0,r3 

sub r2,*ar0++(ir0) , r4 ; (r4 = AI - TI , BI' = r3) 

stf r3, *ar3++ (irO) 

subf r0,r5,r3 ; r3 = TR = r5 - rO 

mpyf *arl++,r7,r0 ; rO - BR * COS , r2 - AR - TR 

subf r3,*ar0,r2 

mpyf *arl++,r6,rl ; rl - BI * SIN , (AI' - r4) 

stf r4,*ar2++(ir0) 

addf *ar0++,r3,r5 ; r5 = AR + TR , BR' - r2 

stf r2,*ar3++ 
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; r5 = BI * COS 



mpy f *+arl , r7 , r5 

I | stf r5,*ar2++ 

addf rl,r0,r2 

mpyf *arl,r6,r0 

II addf r2,*ar0,r3 
subf r2,*ar0++,r4 

I | stf r3, *ar3++ 

subf r0,r5,r3 

mpyf *arl++,r7,r0 

I I subf r3, *arO, r2 

bf2end mpyf *arl++ (irO) , r6, rl 

I | addf *ar0++,r3 / r3 



(AR' = r5) 



* clear pipeline 
stf 



r2, *ar3++ 

I | stf r4, *ar2++ 

add rl,r0,r2 

add r2,*ar0,r3 

I j stf r3, *ar2++ 

subf r2,*ar0,r4 

I | stf r3, *ar3 

stf r4,*ar2 



z(r2 = TI = rO + rl) 
rO = BR * SIN , 
r3 = AI + TI) 

(r4 = AI - TI , 
y(L) m BI' - r3) 

r3 = TR = r5 - rO 
rO = BR * COS , 
r2 = AR - TR 

rl = BI * SIN , 
r3 - AR + TR 



BR' = r2 , AI' « r4 



r2 = TI = rO + rl 
r3 - AI + TI , AR' 



r4 = AI - TI 
AI' = r4 



r3 

BI' = r3 



**************** ************^ 

* LAST STAGE * 

*********************************************************************** 



ldi 


@input,arO , 


upper input 


ldi 


ar0,ar2 , 


upper output 


ldi 


@inputp2,arl , 


lower input 


ldi 


arl,ar3 , 


lower output 


ldi 


@sintp2,ar7 


pointer to twiddle 
? factors 


ldi 


3,ir0 


? group offset 


ldi 


@fg4m2, rc 




* fill pipeline. 






* 1. butterfly: 


w A 0 




addf 


*ar0,*arl,r6 t 


• AR' = r6 = AR + BR 


subf 


*arl++,*ar0++,r7 


; BR' = r7 = AR - BR 


addf 


*ar0,*arl,r4 


; AI' = r4 = AI + BI 


subf 


*arl++ (irO) , *arO++ (irO) , r5 






? BI' - r5 - AI - BI 


* 2. butterfly: 


w A M/4 




addf 


*+arl, *arO, r3 


? AR' » r3 = AR + BI 


ldf 


*-ar7,rl 


; rl = 0 (for inner loop) 


1 1 ldf 


*arl++,rO 


? rO = BR (for inner loop) 


rptbd 


bflend 


; Setup for loop bflend 
; BR' = r2 = AR - BI 


subf 


*arl++ (irO) , *arO++, r2 


stf 


r6,*ar2++ 


r (AR' = r6) 


1 1 stf 


r7,*ar3++ 


; (BR' = r7) 


stf 


r5, *ar3++ (irO) 


r (BI' « r5) 
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Example 12-41. Faster Version Complex, Radix-2 DITFFT (Continued) 

* 3. to M. butterfly: 

* loop bflend 

ldf *ar7++,r7 ; r7 - COS , ( <AI' = r4) ) 

I stf r4, *ar2++ (irO) 

ldf *ar7++,r6 ; r6 = SIN , (BR' « r2) 

I stf r2,*ar3++ 

mpyf *+arl,r6,r5 ; r5 = BI * SIN , 

; (AR' = r3) 

I stf r3,*ar2++ 

addf rl / r0 / r2 ; (r2 = TI = rO + rl) 

mpyf *arl,r7,r0 ; rO = BR * COS , 

; (r3 = AI + TI) 

I addf r2,*ar0,r3 

subf r2, *arO++ (irO) , r4 ; (r4 = AI - TI , 

; BI' = r3) 

| stf r3,*ar3++(ir0) 

addf r0,r5,r3 ; r3 = TR - rO + r5 

mpyf *arl++,r6,r0 ; rO = BR * SIN , 

; r2 = AR - TR 

I subf r3,*ar0, r2 

mpyf *arl++ (irO) , r7, rl ; rl = BI *,COS 

; (AI' = r4) 

I stf r4, *ar2++ (irO) 

addf *ar0++,r3,r3 ; r3 = AR + IR, BR' = r2 

I r2,*ar3++ 

mpyf *+arl,r7,r5 ; r5 = BI * COS , 

; (AR' = r3) 

I stf r3,*ar2++ 

subf rl / r0 / r2 ; (r2 = TI = rO - rl) 

mpyf *arl / r6,r0 ; rO - BR * SIN , 

; (r3 = AI + TI) 

I addf r2, *ar0,r3 

subf r2, *arO++ (irO) , r4 ; (r4 = AI - TI , 

; BI' = r3 

I stf r3, *ar3++ (irO) 

subf r0,r5,r3 ; r3 = TR = rO - r5 

mpyf *arl++,r7,r0 / rO = BR * CO£ 

; r2 = AR - IR 

I subf r3, *arO, r2 

bflend mpyf *arl++ (irO) , r6, rl ; rl - BI * SIN , 



I addf *ar0++,r3,r3 



r3 - AR + TR 



clear pipeline 



stf 


r2, 


*ar3++ 


stf 


r4, 


*ar2++ (irO) 


addf 


rl, 


r0,r2 


addf 


r2, 


*ar0,r3 


stf 


r3, 


*ar2++ 


subf 


r2, 


*ar0,r4 


stf 


r3, 


*ar3 


stf 


r4, 


*ar2 



; BR' = r2 , (AI' = r4) 

; r2 = TI = rO + rl 

; r3 = AI + TI , AR' = r3 

; r4 = AI - TI , BI' = r3 

; AI' = r4 
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Example 12-41. Faster Version Complex, Radix-2 DITFFT (Concluded) 

*********************************************************************** 

* END OF FFT ■ * 

*********************************************************************** 



*************************************************** 

* — BIT REVERSAL * 

*********************************************************************** 





ldi 


@f ftsiz, irO 




ldi 


2, irl 




ldi 


@ input , arO 




xUX 


l~vJU.L£JU.l~, O.X.JL 




ldi 






subi 


2, rc 




ldf 


*+arO(l) , rO 




rptb 


bitrv 




ldf 


*arO++ (irO)b, rl 


1 1 


stf 


rO, *+arl (1) 


bitrv 


ldf 


*+arO(l) , rO 


I I 


stf 


rl, *arl++ (irl) 




ldf 


*arO++ (irO)b, rl 


I I 


stf 


rO, *+arl (1) 




stf 


rl, *arl 


end: 


nop 






nop 






nop 






nop 




self 


br self 






.end 





The 'C40 quickly executes FFT lengths up to 1 024 points (complex) or 2048 
(real), covering most applications, because it can do so almost entirely in 
on-chip memory. Table 12-2 summarizes the execution time required for 
FFT lengths between 64 and 1024 points for the four algorithms in 
Example 12-37, Example 12-39, Example 12-40, and Example 12-41. 
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Table 12-2. TMS320C40 FFT Timing Benchmarks 





FFT Timing (in milliseconds) 


Number 


Complex 


Complex 


Complex 


Real 


of 

VI 


Radix-2 

1 I CI VI 14V mm 


Radix-2 


Radix-4 


Radix-2 


Points 


(Example 12-37) 


(Example 12-41) 


(Example 12-39) 


(Example 12-40) 


64 


0.09112 


0.0606 


0.0694 


0.04 


128 


0.2066 


0.13316 




0.09156 


256 


0.46288 


0.3058 


0.36756 


0.20712 


512 


1.02636 


0.69208 




0.45988 


1024 


2.25544 


1.54516 


1.82924 


1.01984 



12.4.5 Lattice Filters 

The lattice form is an alternative way of implementing digital filters; it has 
found applications in speech processing, spectral estimation, and other 
areas. In this discussion, the notation and terminology from speech pro- 
cessing applications are used. 

If H(z) is the transfer function of a digital filter that has only poles, A(z) = 
1 /H(z) will be a filter having only zeros, and it will be called the inverse filter. 
The inverse lattice filter is shown in Figure 1 2-5. These equations describe 
the filter in mathematical terms: 

f(i,n) . f(i-1,n) + k(i) b(i-1,n-1) 
b(i,n) m b(i-1,n-1) + k(i) f(i-1,n) 

Initial conditions: 

f(0,n) = b(0,n) = x(n) 
Final conditions: 

y(n) » f(p,n). 

In the above equation, f(i,n) is the forward error, b(i,n) is the backward error, 
k(i) is the i-h reflection coefficient, x(n) is the input, and y(n) is the output 
signal. The order of the filter (i.e., the number of stages) is p. In the linear 
predictive coding (LPC) method of speech processing, the inverse lattice 
filter is used during analysis, and the (forward) lattice filter is used during 
speech synthesis. 
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Figure 12-5. Structure of the Inverse Lattice Filter 



x(n) f(0,n) 



fd.n) 



f(p, n) * y(n) 




b(0, n) 



b(1,n) 



b(M,n) 



Figure 1 2-6 shows the data memory organization of the inverse lattice filter 
on the 'C40. 



Figure 12-6. Data Memory Organization for Inverse Lattice Filters 



low 
address 



reflection 
coefficients 



k(1) 
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# 



high 
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k(p) 
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Example 12-42. Inverse Lattice Filter 



TITLE INVERSE LATTICE FILTER 
SUBROUTINE LATINV 

LATINV == LATTICE FILTER (LPC INVERSE FILTER - ANALYSIS) 
TYPICAL CALLING SEQUENCE: 



load 
LAJU 
load 
load 
load 



R2 

LATINV 
ARO 
AR1 
RC 



* 

* 
★ 

LATINV 



ARGUMENT ASSIGNMENTS: 
ARGUMENT | FUNCTION 



R2 
ARO 
AR1 
RC 



I 



f (0,n) - x(n) 

ADDRESS OF FILTER COEFFICIENTS (k(l)) 
ADDRESS OF BACKWARD PROPAGATION VALUES 
RC = p - 2 



<b(0,n-l) ) 



REGISTERS USED AS INPUT: R2, ARO, AR1, RC 

REGISTERS MODIFIED: R0, Rl, R2, R3, RS, RE, RC, ARO, AR1 
REGISTER CONTAINING RESULT: R2 (f(p,n)) 

PROGRAM SIZE: 11 WORDS 

EXECUTION CYCLES: 5 + 3p 

.global LATINV 
i = 1 



RPTBD LOOP 

MPYF3 *AR0, *AR1,R0 

LDF R2 , R3 

MPYF3 *AR0++ ( 1 ) , R2 , Rl 



Setup the delayed repeat 
block loop 

k(l) * b(0,n-l) -> R0 
Assume f (0,n) -> R2 . 
Put b(0,n) = f (0,n) -> R3 . 
k(l) * f (0,n) -> Rl 
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2 <= i <= p (Repeat block loop start here) 



I I 

LOO 

★ 

★ 



MPYF3 *AR0, *++ARl (1) , RO 

ADDF3 R2,R0,R2 



ADDF3 *-ARl (1) ,R1,R3 

STF R3,*-AR1(1) 

MPYF3 *AR0++ ( 1 ) , R2 , Rl 

I = P + 1 (CLEANUP) 



BUD 
ADDF3 



ADDF3 

STF 
NOP 

end 

.end 



Rll 

R2,R0,R2 



*AR1,R1,R3 
R3, *AR1 



k(i) * b(i-l,n-l) -> RO 

f (i-l-l,n) + k(i-l) *b(i-l-l,n-l) 

- f (i-l,n) -> R2 

b(i-l-l,n-l) + k(i-l) *f (i-l-l,n) 
« b(i-l,n) -> R3 
b(i-l-l,n) -> b(i-l-l,n-l) 

k(i) * f (i-l,n) -> Rl 



Delayed return 

f(p-l,n) + k(p)*b(p-l,n-l) 

= f (p,n) -> R2 

b(p-l,n-l) + k(p)*f (p-l,n) 
« b(p,n) -> R3 
b(p-l,n) -> b(p-l,n-l) 



The structure of the forward lattice filter, shown in Figure 12-7, is similar to 
that of the inverse filter (also shown in the figure). These corresponding 
equations describe the lattice filter: 

f(i-1,n) = f(i,n) - k(i) b(i-1,n-1) 
b(i,n) = h(i-1,n-1) + k(i) f(i-1,n) 

Initial conditions: 

f(p,n) = x(n), b(i,n-1) = 0 for i = i f p 

Final conditions: 

y(n)-f(0,n). 

The data memory organization is identical to that of the inverse filter, as 
shown in Figure 12-6. Example 12-^43 shows the implementation of the lat- 
tice filter on the 'C40. 
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Figure 12-7. Structure of the (Forward) Lattice Filter 



x(n)=:f(p, n) 




b(p, n) 



b(2, n) 



b(1,n) 



Example 12-43. Lattice Filter 

* TITLE LATTICE FILTER 

* SUBROUTINE LATICE 

LAJU LATICE 
LOAD ARO 
LOAD AR1 
LOA RC 

★ 

* ARGUMENT ASSIGNMENTS: 

* ARGUMENT | FUNCTION 



* + . 

* R2 | F (P/ N) = E (N) = EXCITATION 

* ARO | ADDRESS OF FILTER COEFFICIENTS (K(P)) 

* AR1 | ADDRESS OF BACKWARD PROPAGATION 

* I VALUES (B(P-1,N-1)) 

* RC | RC = P - 2 
* 



* REGISTERS USED AS INPUT: R2, ARO, AR1, RC 

* REGISTERS MODIFIED: RO, Rl, R2, R3, RS, RE, RC, ARO, AR1 

* REGISTER CONTAINING RESULT: R2 (f(0,n)) 

* PROGRAM SIZE: 13 WORDS 



Concluded on next page 



12-92 



Software Applications 



Applications-Oriented Operations — Lattice Filter 



Example 12-43. Lattice Filter (Concluded) 



* 
* 

LATICE 



LOOP 



EXECUTION CYCLES: 3 + 5P 
.global LATICE 



RPTBD 

MPYF3 
SUBF3 
NOP 



LOOP 

*AR0, *AR1 , RO 
R0,R2,R2 



Setup the delayed repeat 
block loop 

K(P) * B(P-1,N-1) -> RO 
Assume F(P,N) -> R2 
F (P,N) -K(P) *B(P-1,N-1) 
- F(P-1,N) -> R2 



2 <= i <= p (Repeat block loop start here) 



MPYF3 *AR0,R2,R1 

MPYF3 * - -ARO (1) , *-ARl (1) , RO 

ADDF3 *AR1 (1),R1,R3 

STF R3,*+AR1(2) 

SUBF3 R0,R2,R2 



1=1 (CLEANUP) 



BUD 

MPYF 

ADDF3 

STF 
STF 

end 

end 



Rll 

*AR0,R2,R1 
*AR1,R1,R3 

R3,*+AR1 (1) 
R2, *AR1 



K(I) * F(M,N) -> Rl 
K(I-l) * 

B(I-1-1,N-1) -> RO 

B(I-1,N-1) + K(I) *F(I-1,N) 

= B(I,N) -> R3 

B(I,N) -> B(I,N-1) 

F(M,N)-K(M) 

*B(M-1,N-1) 

= F ( 1-1-1 , N) -> R2 



Delayed return 
K(l) * F(0,N) -> Rl 
B(0,N-1) + K(l) *F(0,N) 
= B(1,N) -> R3 
B(1,N) -> B(1,N-1) 
F(0,N) -> B(0,N-1) 
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12.5 Programming Tips 

Programming style is highly personal and reflects each individual's prefer- 
ences and experiences. The purpose of this section is not to impose any 
particular style. Instead, it emphasizes some of the features of the 'C40 that 
can help in producing faster and/or shorter programs. The tips cover both 
C compiler and assembly language programming. 

12.5.1 C-Callable Routines 

The 'C40 was designed with a large register file, software stack, and large 
memory space in order to implement a high-level language (HLL) compiler 
easily. The first such implementation supplied is a C compiler. Use of the C 
compiler increases the transportability of applications that have been tested 
on large, general-purpose computers and decreases their porting time. 

To use the compiler efficiently, complete the following steps: 

1 ) Write the application in the high-level language. 

2) Debug the program. 

3) Estimate if it runs in realtime. 

4) If it doesn't, identify places where most of the execution time is spent. 

5) Optimize these areas by writing assembly language routines that imple- 
ment the functions. 

6) Call the routines from the C program as C functions. 

When writing a C program, you can increase the execution speed by maxi- 
mizing the use of register variables. For more information, refer to the 
TMS320 Floating-Point DSP Optimizing C Compiler User's Guide (litera- 
ture number SPRU034, due for release 3Q, 1 991 ). 

Certain conventions must be observed in writing a C-callable routine. These 
conventions are outlined in the Runtime Environment chapter of the 
TMS320 Floating-Point DSP Optimizing C Compiler User's Guide. Certain 
registers are saved by the calling function, and others need to be saved by 
the called function. The C compiler manual helps achieve a clean interface. 
The end result is the readability and natural flow of a high-level language 
combined with the efficiency and special-feature use of assembly language. 



12-94 



Software Applications 



Programming Tips 

12.5.2 Hints for Optimizing Assembly Code 

Each program has particular requirements. Not all possible optimizations 
will make sense in every case. The suggestions presented in this section 
can be used as a checklist of available software tools. 

□ Use delayed branches. Delayed branches execute in a single cycle; 
regular branches execute in four. The three instructions that follow the 
delayed branch are executed whether the branch is taken or not. If few- 
er than three instructions are used, use the delayed branch and append 
NOPs. Machine cycles (time) are still being saved. 

□ Use delayed subroutine call and return. Regular subroutine CALL 
and RETS execute in four cycles. The delayed subroutine call can be 
achieved by using link and jump (LAJ) and delayed branches with R11 
register mode (BUD R11) instructions. Both LAJ and BUD instructions 
execute in a single cycle. The rule for using LAJ instruction is the same 
as for delayed branches. 

□ Apply the repeat single/block construct. In this way, loops are 
achieved with no overhead. Nesting such constructs will not normally 
increase efficiency, so try to use the feature on the most often per- 
formed loop. The RPTBD is a single-cycle instruction, and the RPTS 
and RPTB are four-cycle instructions. The usage of RPTBD is similar 
to that of the delayed branches. Note that RPTS is not interruptible, and 
the executed instruction is not refetched for execution. This frees the 
buses for operands. 

□ Use parallel instructions. It is possible to have a multiply in parallel 
with an add (or subtract) and to have stores in parallel with any multiply 
or ALU operation. This increases the number of operations executed in 
a single cycle. For maximum efficiency, observe the addressing modes 
used in parallel instructions and arrange the data appropriately. You can 
have loads in parallel with any multiply or add (or subtract). Since the 
result of a multiply by one or an add of zero is the same as a load, parallel 
instructions with a data load can be implemented by substituting the 
load instruction with a multiply or an add instruction with one extra 
register containing a one or zero. 
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□ Maximize the use of registers. The registers are an efficient way to 
access scratch-pad memory. Extensive use of the register file facilitates 
the use of parallel instructions and helps avoid pipeline conflicts when 

/ you use register addressing (register addressing is described in 
subsection 5.1.1 on page 5-3). 

□ Use the cache. Use cache especially in conjunction with slow external 
memory. The cache is transparent to the user, so make sure that it is 
enabled. 

□ Use internal memory instead of external memory. The internal 
memory (2Kx 32 bits RAM and 4Kx 32 bits ROM) is considerably faster 
to access. In a single cycle, two operands can be brought from internal 
memory. You can maximize performance if you use the DMA in parallel 
with the CPU to transfer data to internal memory before you operate on 
them. 

□ Avoid pipeline conflicts. If there is no problem with program speed, 
ignore this suggestion. For time-critical operations, make sure that 
cycles are not missed because of conflicts. To identify conflicts, run the 
trace function on the development tools (simulator, emulators) with the 
program tracing option enabled. The tracing immediately identifies the 
pipeline conflicts. Consult the appropriate section of this user's guide 
for an explanation of the reason for the conflict. You can then take steps 
to correct the problem. 

The above checklist is not exhaustive, and it does not address some fea- 
tures outlined in more detail in the different sections of this manual. To learn 
how to exploit the full power of the 'C40, carefully study its architecture, 
hardware configuration, and instruction set described in this user's guide. 
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12.6 Peripherals 

TMS320C40 peripheral modules include one analysis module, two timers, 
six direct memory access (DMA) controllers, and six high speed bi-direc- 
tional communication ports. They are designed to improve system perform- 
ance and decrease system cost without reducing the computational 
throughput of the CPU. These peripheral modules are controlled through 
memory-mapped registers located on the dedicated peripheral bus. The ex- 
amples that show how to program the timer, communication port, and DMA 
operations are presented in the following subsections. 

12.6.1 Timers 

There are two general-purpose, 32-bit timers on the 'C40 device. Both tim- 
ers are identical to and independent from each other (detailed information 
on the timers is in Section 9.1 0 on page 9-45). The timers are controlled by 
three registers: timer global control register, timer counter register, and tim- 
er period register. Pins TCLKO and TCLK1 of '040 are dedicated for timers. 
These pins can be configured as either general-purpose data I/O or timer. 

If bit 0 and bit 9 of the timer global control register are set to 0, the TCLKx 
pin is configured as a general-purpose data I/O pin. Timer counter and peri- 
od registers have no effect on this configuration. Bit 1 of the timer global con- 
trol register is used to configure TCLKx as an input or output pin. If TCLKx 
is configured as an output pin (bit 1=1), the data value in bit 2 of the timer 
global control register is shown on TCLKx. If TCLKx is configured as an in- 
put pin (bit 1 = 0), the signal on TCLKx is shown in bit 3 of the timer global 
control register. 

If bit 0 of the timer global control register is set to 1 , pin TCLKx is configured 
as a timer pin. The frequency of the timer signaling is specified by the timer 
period register. However, this assumes that the timer counter register 
equals 0 ( writing 1 to bit 6 of the timer global control register will reset the 
counter register, too). If the timer counter register has a nonzero value in it, 
the first period will be different than the others. When the counter register 
is set to a value greater than the period register, the counter will count, roll 
over to 0, and continue counting to period register. Therefore, it is important 
to have correct values in the timer period and counter registers before start- 
ing the timer (writing a 1 to bit 7 of the timer global control register starts the 
timer). 
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The frequency of the timer signaling is determined by the frequency of the 
timer input clock and the period register. The following equations are valid 
with either an internal or an external timer clock: 
f(pulse mode) = f(timer clock) / period register 
f(clock mode) = f(timer clock) / (2 x period register) 

When the period and counter register are zero, the operatioji of the timer 
is dependent upon the C/P mode selected. In pulse mode (C/P = 0), TSTAT 
is set and remainsset. In the other words, the frequency is equal to infinite. 
In clock mode (C/P = 1), the width of the cycle is 2/f(H1), and the external 
clock is ignored. Therefore, the maximum frequency of timer clock gener- 
ated by internal clock is f(H1)/2. Example 12-44 shows how to set up the 
'C40 timer to generate the maximum frequency clock through the TCLKx 
pin. 

Example 12-44. Maximum Frequency Timer Clock Setup 

* TITLE MAXIMUM FREQUENCY TIMER CLOCK SETUP 

* THIS EXAMPLE SHOWS HOW TO SET UP TIMER TO GENERATE MAXIMUM 

* FREQUENCY TIMER CLOCK USING INTERNAL CLOCK. WHERE 

* "TIMER REGISTER" SECTION IS LOCATED FROM 808020H. 

.usect "TIMER_REGISTER", 4 

.usect "TIMER__REGISTER", 4 

. usect "TIMERJREGISTER" , 8 



TIM0_CTLJREG 
TIM0__CNT_REG 
T IM0__P RD__RE G 

.text 



LDI 0,R0 

STI R0, @TIMCMPRD__REG 

LDI 3C1H,R0 

STI R0,@TIM0 CTL REG 



. end 

12.6.2 Communication Ports 

In order to provide direct processor-to-processor communication, '040 has 
six parallel bidirection communication ports (see Chapter 8). Since these 
ports have port arbitration units to handle the ownership of the communica- 
tion port data bus between the processors, the programmer needs to con- 
centrate only on the internal operation of the communication ports. For soft- 
ware, these communication ports can be treated as 32-bit on-chip data I/O 
FIFO buffers. Processor read/write data from/to communication is simple: 
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LDI @comm__portO__input , RO / Read data from comm. port 0 

or 

STI RO, @comm_portO_output ; Write data to comm. port 1 

If the CPU or DMA reads from or writes to the communication port I/O FIFO 
and the I/O FIFO is either empty (on a read) or full (on a write), the read/write 
execution will be extended until the data is available in the input FIFO for 
a read, or the space is available in the output FIFO for a write. Sometimes, 
this can be used to synchronize the devices. However, this will slow down 
the processing speed and even hang up the processor. Avoid such situa- 
tions. 

Each 'C40 communication port provides four flags to indicate the status of 
the port: 

ICRDY (input channel ready) 

= 0, the input channel is empty and not ready to be read. 
= 1 , the input channel contains data and is ready to read. 

ICFULL (input channel full) 

= 0, the input channel is not full. 
= 1 , the input channel is full. 

OCRDY (output channel ready) 

= 0, the output channel is full and not ready to be written. 
= 1 , the output channel is not full and ready to be written. 

OCEMPTY (output channel empty) 

= 0, the output channel is not empty. 
= 1 , the output channel is empty. 

These flags can be used to synchronize the CPU/DMA access to the com- 
munication port. Example 1 2-45 shows reading data from the communica- 
tion port eight data at a time using the CPU ICFULL interrupt. 
Example 12-46 shows writing data to a communication port one datum at 
a time using the polling method. The example shows DMA reads/writes of 
data from/to the communication port (DMA is discussed in the next subsec- 
tion, subsection 12.6.3). 
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Example 12-45. Read Data from Communication Port With CPU ICFULL Interrupt 
* 

* TITLE READ DATA FROM COMMUNICATION PORT WITH CPU 

* ICFULL INTERRUPT 
* 

* THIS EXAMPLE ASSUMES THE ICFULL 0 INTERRUPT VECTOR IS SET IN THE 

* CPU INTERRUPT VECTOR TABLE. THE EIGHT DATA ARE READ IN 

* WHENEVER THE DATA IS FULL IN COMM PORT 0 INPUT FIFO. 



LDA 


@COMM_PORT0_CTL, AR2 , 


• Load comm port 0 






• control reg address 


LDA 


@COMM__PORT0__INPUT, ARO , 


• Load comm port 0 






• input FIFO address 


LDA 


@ INTERNAL RAM, AR1 \ 


• Load internal RAM address 


AND 3 


0F7H / *AR2,R9 


• Unhalt comm port 0 






• input channel 


STI 


R9, *AR2 




OR 


04H,IIE ; Enable ICRDY 0 interrupt 


OR 


02000H / ST ; Enable CPU global interrupt 



ICFULLO 



PUSH 


ST 








PUSH 


RS 








PUSH 


RE 








PUSH 


RC 








RPTBD 


READ 




Setup for loop READ 




LDI 


6,RC 


/ 


Set repeat counter 




LDI 


* ARO, RIO 


/ 


Read data from comm port 


0 






/ 


input 




NOP 










LDI 


* ARO, RIO 


/ 


Read data from comm port 


0 






r 


input 




STI 


R10,*AR1++(1) 


r 


Store data into internal 


RAM 


STI 


RIO, *AR1++ (1) 


/ 


Store data into internal 


RAM 


POP 


RC 








POP 


RE 








POP 


RS 








POP 


ST 








RETI 
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Example 12-46. Write Data to Communication Port With Polling Method 

* 

* TITLE WRITE DATA TO COMMUNICATION PORT WITH POLLING METHOD 
★ 

* THE BIT 8 OF COMMUNICATION PORT 0 CONTROL REGISTER WILL BE 

* SET ONLY WHEN THE OUTPUT FIFO IS FULL. THIS EXAMPLE CHECKS 





THIS BIT TO MAKE SURE THERE 


IS 


SPACE AVAILABLE IN 


* 


OUTPUT 


FIFO. 








LDA 


@COMMJPORT0_CTL, AR2 


r 


Load comm port 0 control reg 










address 




LDA 


(3COMM PORTO OUTPUT, ARO; 


Load comm port 0 output 










FIFO address 




LDA 


@ INTERNAL RAM,AR1 


/ 


Load internal RAM address 




AND 3 


0EFH,*AR2,R9 


/ 


Unhalt comm port 0 output 










channel 




STI 


R9, *AR2 








LDI 


0100H,R9 




Load mask for bit 8 


WAIT: 


TSTB 


*AR2,R9 




Check if output FIFO is full 




BZD 


WAIT 




If yes, check again 


WRITE_COMM 


LDI 


*AR1++ (1) / RIO 




Read data from internal RAM 




STI 


R10,*ARO 




Store data into comm port 










0 output 



12.6.3 Direct Memory Access 

The 'C40 direct memory access (DMA) coprocessor supports six DMA 
channels (detailed information on DMA is in Chapter 9). These channels 
perform transfers to and from anywhere in the processor memory map. The 
DMA coprocessor is a self-programming device that allows data transfers 
to occur without any intervention from the CPU. It also provides a special 
split-mode to support 12 DMA channels for communication port memory 
transfer. This section contains examples of DMA programs from a very sim- 
ple single-block memory-to-memory transfer to a sophisticated memory 
transfer with autoinitialization. 
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Example 12-47 shows one way for setting up DMA channel 2 to initialize 
an array to zero. This DMA transfer is set up to have higher priority over a 
CPU operation and to generate an interrupt flag, DMA INT2, after the trans- 
fer is completed. The DMA control register is set to 03040007H (refer to 
DMA control register bit functions in Table 9-1 on page 9-8 for further infor- 
mation on this setup). 

Example 12-47. Array initialization With DMA 

* 

* TITLE ARRAY INITIALIZATION WITH DMA 
* 

* THIS EXAMPLE INITIALIZES A 128 ELEMENTS ARRAY TO ZERO. THE DMA 

* TRANSFER IS SET UP TO HAVE HIGHER PRIORITY OVER CPU OPERATION. 

* THE DMA INT2 INTERRUPT FLAG IS SET TO 1 AFTER THE TRANSFER IS 



* 
* 


COMPLETED 










. data 








DMA2 


.word 


001000C0H 




DMA channel 2 map address 


CONTROL 


.word 


00C40007H 


f 


DMA register initialization data 


SOURCE 


.word 


ZERO 






SRC IDX 


.word 


0 






COUNT 


.word 


128 






DESTIN 


. word 


ARRAY 






DES IDX 


.word 


1 






ZERO 


.word 


0.0 


r 


Array initialization value 0.0 




.bss 


ARRAY, 128 








.text 








START 


LDP 


@DMA2 


r 


Load data page pointer 




LDA 


@DMA2 , ARO 


t 


Point to DMA channel 2 registers 




LDI 


©SOURCE, R0 


r 


Initialize DMA source register 




STI 


R0, *+AR0 (1) 








LDI 


@SRC_IDX,R0 


/ 


Initialize DMA source index 








/ 


register 




STI 


R0, *+AR0 (2) 








LDI 


@ COUNT, R0 


i 


Initialize DMA count register 




STI 


R0, *+AR0 (3) 








LDI 


0DESTIN,RO 


/ 


Initialize DMA destination 








/ 


register 




STI 


R0, *+AR0 (4) 








LDI 


@DES_IDX,R0 




Initialize DMA destination 








r 


index register 




STI 


R0, *+AR0 (5) 








LDI 


@ CONTROL, R0 


r 


Start DMA channel 2 transfer 




STI 


R0, *AR0 








. end 
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The DMA transfer can be synchronized with external interrupts, communi- 
cation port ICRDY/OCRDY signals, and timer interrupts. In order to enable 
this feature, the SYNCH MODE field, bits 6-7, of the DMA control register 
must be configured to a proper value (Table 9-1 on page 9-8), and the 
corresponding bits of the DMA interrupt enable (DIE) register must be set. 
Example 1 2-48 sets up DMA channel 4 read synchronization with the com- 
munication port ICRDY signal. The DMA is set up to continuously transfer 
data from the communication port input register until the START field, bits 
22-23 of the DMA control register, is changed by the CPU. 

Example 12-48. DMA Transfer With Communication Port ICRDY Synchronization 



TITLE DMA TRANSFER WITH COMMUNICATION PORT ICRDY 

SYNCHRONIZATION 

THIS EXAMPLE SETS UP DMA CHANNEL 4 TO TRANSFER DATA FROM 
COMMUNICATION PORT INPUT REGISTER TO INTERNAL RAM WITH ICRDY 
SIGNAL READ SYNCHRONIZATION. THE TRANSFER MODE OF THE DMA IS 
SET TO 00. THEREFORE THE TRANSFER WON'T STOP UNTIL THE START 
BITS OF THE DMA CONTROL REGISTER IS CHANGED. 



DMA channel 4 map address 

DMA register initialization data 





.data 






DMA4 


.word 


001000E0H 




CONTROL 


.word 


00C00040H 




SOURCE 


.word 


00100081H 




SRC IDX 


.word 


0 




COUNT 


.word 


0 




DESTIN 


.word 


002FF800H 


/ 


DES_IDX 


.word 
.text 


1 




START 


LDP 


0DMA4 






LDA 


@DMA4 , ARO 


/ 




LDI 


@ SOURCE/ R0 






STI 


R0, *+AR0 (1) 






LDI 


@SRC IDX,R0 


} 




STI 


R0,*+AR0(2) 






LDI 


@ COUNT, R0 


/ 




STI 


R0,*+AR0(3) 






LDI 


@DESTIN,R0 


/ 




STI 


R0, *+AR0 (4) 






LDI 


8DES_IDX,R0 


/ 




STI 


R0,*+AR0(5) 


r 




LDI 


@ CONTROL, R0 


r 




STI 


R0, *AR0 






LDHI 


010H,DIE 


f 




.end 







Transfer counter is set to 
largest value 



Load data page pointer 

Point to DAM channel 4 registers 

Initialize DMA source register 

Initialize DMA source index register 

Initialize DMA count register 

Initialize DMA destination register 

Initialize DMA destination index 
register 

Start DMA channel 4 transfer 
Enable ICRDY 4 read sync. 



If external interrupt signals are used for DMA transfer synchronization, then 
pins IIOFO-3 must be configured as interrupt pins also. 
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The '040 DMA split mode is another way besides memory map address to 
transfer data from/to the communication port. When the split-mode bit of the 
DMA control register is set, the DMA is separated into primary and auxiliary 
channels. The primary channel transfers data from memory to the commu- 
nication port output register, and the auxiliary channel transfers data from 
the communication port to memory. The communication port number is se- 
lected in bits15 - 17 of the DMA control register. 

Example 12-49 shows how to set up DMA channel 1 into split mode. The 
DMA primary channel transfers data from internal RAM to communication 
port 3 using external interrupt INT2 synchronization and bit-reversed ad- 
dressing. The DMA auxiliary channel transfers data from communication 
port 3 to internal RAM using external interrupt INT3 synchronization and lin- 
ear addressing. 

Example 12-49. DMA Split-Mode Transfer With External Interrupt Synchronization 

* 

* TITLE DMA SPLIT-MODE TRANSFER WITH EXTERNAL INTERRUPT 

SYNCHRONIZATION 

* THIS EXAMPLE SETS UP DMA CHANNEL 1 TO SPLIT-MODE. THE PRIMARY 

* CHANNEL TRANSFERS DATA FROM INTERNAL RAM TO COMM PORT 3 OUTPUT 

* REGISTER WITH EXTERNAL INTERRUPT INT2 SYNCHRONIZATION AND BIT- 

* REVERSED ADDRESSING. THE AUXILIARY CHANNEL TRANSFERS DATA FROM 

* COMMUNICATION PORT 3 INPUT REGISTER TO INTERNAL RAM WITH 





EXTERNAL 


INTERRUPT INT3 


SYNCHRONIZATION AND LINEAR ADDRESSING. 




. data 








DMA1 


.word 


001000B0H 


/ 


DMA channel 1 map address 


CONTROL 


.word 


03CDD0D4H 




DMA register initialization data 


SOURCE 


.word 


002FFC00H 




SRC IDX 


.word 


08H 


r 


The same value as IRO for bit-reversed 


COUNT 


.word 


8 






DESTIN 


.word 


002FF800H 






DES IDX 


.word 


1 






AUX_CNT 


.word 


8 








.text 








STAR 


LDP 


@DMA1 


r 


Load data page pointer 




LDA 


@DMA1, ARO 


r 


Point to DAM channel 1 registers 




LDI 


@ SOURCE, RO 


r 


Initialize DMA primary source register 




STI 


RO, *+AR0 (1) 








LDI 


@SRC IDX,R0 


i 


nitialize DMA primary source index reg 




STI 


R0/*+AR0(2) 








LDI 


@ COUNT, RO 


$ 


Initialize DMA primary count register 




STI 


R0 f *+AR0(3) 








LDI 


0DESTIN/RO 


/ 


Initialize DMA aux destination 








/ 


register 
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STI 
LDI 



RO, *+ARO (4) 
gDES_IDX,RO 



Initialize DMA aux destination 
index register 



STI 
LDI 



RO, *+AR0(5) 
@AUC_CNT,R0 



Initialize DMA auxiliary count 
register 



STI 
LDI 
STI 



R0,*+AR0 (7) 
@ CONTROL, RO 
RO, *AR0 



Start DMA channel 1 transfer 



LDI 
. end 



LDI 



01100H, IIF 



0A0H,DIE 



Configure INT2 and INT3 as 
interrupt pins 

Enable INT2 read and INT3 write sync. 



An advantage of the 'C40 DMA is the autoinitialization feature. This allows 
you to set up the DMA transfer in advance and makes the DMA operation 
1 00 percent independent from the CPU. When the DMA is operating in auto- 
initialization mode, the link pointer and auxiliary link pointer are used to ini- 
tialize the registers that control the DMA operation. The link pointer may be 
incremented (AUTOINIT STATIC = 0 — shown in Table 9-1 on page 9-8) 
during autoinitialization or held constant (AUTOINIT STATIC = 1 ) during au- 
toinitialization. This option allows autoinitialization values to be stored in se- 
quential memory locations or in stream-oriented devices such as the 
on-chip communication ports or external FIFOs. When DMA SYNC MODE 
is enabled, The DMA autoinitialization operation can be configured to syn- 
chronize with the same signal too. Example 1 2-50 sets up DMA channel 0 
to wait for the communication port to input the initialization value. After DMA 
autoinitialization is complete, the DMA channel starts transferring data from 
the communication port input register to internal RAM. 
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Example 12-50. DMA Autoinitialization With Communication Port ICRDY 



TITLE DMA AUTOINITIALIZATION WITH COMMUNICATION PORT ICRDY 

THIS EXAMPLE SETS UP DMA CHANNEL 0 TO WAIT FOR COMMUNICATION 
PORT TO INPUT THE INITIALIZATION VALUE. THE DMA AUTOINITIAL- 
IZATION AND TRANSFER ARE BOTH DRIVEN BY ICRDY 0 FLAG. AFTER 
DMA AUTOINIT IS COMPLETED, THE DMA CHANNEL STARTS TRANSFERRING 
DATA FROM COMM PORT INPUT REGISTER TO INTERNAL RAM WITH ICRDY 
0 READ SYNCHRONIZATION. THE VALUES IN COMM PORT 0 INPUT FIFO 
SHOULD BE: 

SEQUENCE | VALUE 

. + . . . 

00C40047H (STOP AFTER TRANSFER COMPLETED) 
OR 00C4054BH (REPEAT AFTER TRANSFER COMPLETED) 
00100041H 



2 
3 
4 
5 
6 
7 



OH 
20H 

002FF800H 
1H 

00100041H 





. data 








DMAO 


.word 


001000A0H 




DMA channel 0 map address 


DMA INIT 


.word 


0004054BH 




DMA initialization control word 


LINK 


.word 


00100041H 




Comm port input register address 


DMA_JSTART 


.word 


00C4054BH 




DMA start control word 




.text 








START 


LDP 


@DMA0 




Load data page pointer 




LDA 


0DMAO, ARO 


t 


Point to DMA channel 0 registers 




LDI 


@DMA INIT,R0 


} 


Initialize DMA control regiester 




STI 


RO, *AR0 








LDI 


@LINK,R0 




Initialize DMA link 1 pointer 




STI 


RO, *+AR0 (6) 








LDI 


@DMA START, RO 


r 


Start DMA channel 0 transfer 




STI 


R0,*AR0 








LDI 


01H,DIE 




Enable ICRDY 0 read sync. 




. end 







The DMA autoinitialization and transfer will continue executing if 
the DMA autoinitialization is still enabled. Therefore, a DMA setup like the 
one in Example 1 2-50 can make it possible for the DMA operation to be 
controlled by an external device through the communication port. 
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With the autoinitialization feature, the 'C40 DMA can support a variety of 
DMA operations without slowing down CPU computation. A good example 
is a DMA transfer triggered by one interrupt signal. Usually, this is achieved 
by starting a DMA activity with a CPU interrupt service routine, but this uti- 
lizes CPU time . However, with the autoinitialization feature, 'C40 DMA can 
achieve this kind of setup without CPU interruption, as shown in 
Example 12-51. One method is to set up a single interrupt-driven dummy 
DMA transfer with autoinitialization. When the interrupt signal is set, the 
DMA will complete the dummy DMA transfer and start the autoinitialization 
for the desired DMA transfer. 

Example 12-51. Single-lnterrupt-Driven DMA Transfer 

* 

* TITLE SINGLE INTERRUPT-DRIVEN DMA TRANSFER 

* THIS EXAMPLE SETS UP A DUMMY DMA TRANSFER FROM INTERNAL RAM 

* TO THE SAME MEMORY WITH EXTERNAL INT 0 SYNCHRONIZATION AND 

* AUTOINITIALIZATION FOR TRANSFERRING 64 DATA FROM LOCAL MEMORY 

* TO INTERNAL RAM. AFTER THE SECOND TRANSFER IS COMPLETED, THE 

* DMA IS RE-INITIALIZED TO FIRST DMA RANSFER SETUP. 





.data 








DMA5 


.word 


001000F0H 




DMA channel 5 map address 


DMA INIT 


.word 


0000004BH 




DMA initialization control word 


LINK 


.word 


DMA1 




1st DMA link list address 


DMA START 


.word 


00C0004BH 




DMA start control word 


DMA1 


.word 


00C0004BH 




1st dummy DMA transfer link list 




.word 


002FF800H 






.word 


00000000H 








.word 


00000001H 








.word 


002FF800H 








.word 


00000000H 








.word 


DMA2 






DMA2 


.word 


00C4000BH 




The desired DMA transfer link 




.word 


00400000H 


/ 


list 




• word 


00000001H 








.word 


00000040H 








.word 


002FF800H 








.word 


00000001H 








.word 


DMA1 








.text 








START 


LDP 


@DMA5 




Load data page pointer 




LDA 


@DMA5, ARO 


/ 


Point to DMA channel 5 registers 




LDI 


@DMA INIT, RO 


/ 


Initialize DMA control register 




STI 


RO, *AR0 






LDI 


@LINK,R0 


r 


Initialize DMA link pointer 




STI 


RO, *+AR0 (6) 








LDI 


@DMA START, RO 


f 


Start DMA channel 5 transfer 




STI 


RO, *AR0 








LDI 


01H, IIF 


/ 


Configure INTO as interrupt pins 




LDHI 


0800H,DIE 


r 
r 


Enable INT 0 read sync, for 
DMA channel 5 




.end 
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Chapter 13 



Hardware Applications 



The TMS320C40's advanced interface design can be used to implement a 
wide variety of system configurations. Its two external buses and DMA ca- 
pability provide a flexible parallel 32-bit interface to byte- or word-wide de- 
vices; the communication ports provide a glueless interface to other 'C40s; 
and the interrupt interface, communication ports, and general-purpose digi- 
tal I/O provide communication with a multitude of peripherals. 

This chapter describes how to use the 'C40's interfaces to connect to vari- 
ous external devices. Specific discussions include implementation of paral- 
lel interface to devices with and without wait states, parallel processing 
through the communication ports and port control logic, and system control 



function circuit design. 

Major topics discussed in this chapter are as follows: 

Section Page 

13.1 System Configuration Options Overview 13-3 

■ Categories of Interfaces on the TMS320C40 13-3 

13.2 Boot Loader Description and External ROM Interfacing ...13-5 

■ TMS320C40 Boot Loader Description 13-5 

■ Examples of External Memory Loads 13-8 

■ Communication Port Loading 13-8 

■ External ROM Interfacing to the TMS320C40 13-9 

■ External Memory Loading 13-14 

■ TMS320C40 Boot Loader Source Program 13-14 

13.3 Global and Local Bus Interface 13-20 

■ Zero Wait-State Interface to RAMs 13-20 

13.4 Ready Generation 13-27 

■ ORing of the Ready Signals (SWW = 10) 13-28 
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m ANDing of the Ready Signals (SWW =11) 1 3-28 

■ External Ready Generation 1 3-29 

■ Ready Control Logic 13-30 

■ Example Circuit 13-31 

■ Page Switching Techniques 1 3-32 

13.5 Parallel Processing Interfaces 13-37 

■ Message Broadcasting From a TMS320C40 to Many 
TMS320C40's 13-37 

■ Shared Global Memory Interface With Fair Arbitration . 13-38 

■ Shared Bus Interface Overview 13-43 

13.6 Bus Arbitration 13-48 

■ Arbitration Implementation 13-48 

■ Global Bus Arbitration and Transfer Timing 1 3-70 

■ Arbitration Protocol Limitations 13-70 

13.7 Reset Signal Generation Control Functions 13-75 
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13.1 System Configuration Options Overview 

The 'C40 interfaces connect to a wide variety of device types. Each of these 
interfaces is tailored to a particular family of devices. 

13.1.1 Categories of Interfaces on the TMS320C40 

The interface types on the 'C40 fall into several different categories, de- 
pending on the devices to which they are intended to be connected. Each 
interface comprises one or more signal lines, which transfer information and 
control its operation. Shown in Figure 1 3-1 are the signal line groupings for 
each of these interfaces. 



Figure 13-1. External Interfaces to the TMS320C40 
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Data 
Address 

Data Enable 
Address Enable 
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\ j> Control 

Control Signal Enable 

\ j> Control 
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^Communication 
J Port Interface 

> Timer Interface 
and I/O Flags 



Note: n = 0 for Communication Port 0, n = 1 for Communication Port 1 , etc. 

Each interface is independent of the others, and different operations may 
be performed simultaneously on each interface. These pins are defined in 
more detail in Table 14-2 on page 14-5. 
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The global and local buses implement the primary memory-mapped 
interfaces to the device. These interfaces allow external devices such as 
DMA controllers and other microprocessors to share resources with one or 
more 'C40's through a common bus. 

The devices that can be interfaced to the 'C40 include memory, DMA de- 
vices, and numerous parallel and serial peripherals and I/O devices. In ad- 
dition, 'C40's can interface directly with each other, without external logic, 
through their communication ports or their external flag pins IIOF(0-3). 
Figure 1 3-2 illustrates a typical configuration of a '040 system with different 
types of external devices and the interfaces to which they are connected. 

Figure 13-2. Possible System Configurations 



Fast Local 
Memory 



Peripherals 
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Bit I/O 



I/O Devices 
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Peripherals 
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TMS320C40S 
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The above block diagram in Figure 1 3-2 constitutes a more or less fully ex- 
panded system. In an actual design, any subset or superset of the illustrated 
configuration may be used. 
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13.2 Boot Loader Description and External ROM Interfacing 

13.2.1 TMS320C40 Boot Loader Description/Operation 

The boot loader provided in the on-chip ROM of the 'C40 can load and ex- 
ecute source programs that are received from a host processor, 
inexpensive ROM, or other standard memory devices. The 'C40 boot 
loader functions primarily as either a memory boot loader or a 
communication port boot loader. 

□ The memory boot loader supports user-definable byte, half-word, and 
full-word data formats, which allow the flexibility to load a source pro- 
gram from memories having widths of a byte, 1 6 bits, and 32 bits. The 
source programs to be loaded reside in one of six predefined memory 
locations: 0x0030 0000, 0x4000 0000, 0x6000 0000, 0x8000 0000, 
OxAOOO 0000, and OxCOOO 0000 as listed in Table 13-1. 

□ The communication port boot loader waits for the first data input from 
one of the six communication port channels and uses that channel to 
perform the boot load. Format of the incoming data stream is similar to 
that for a memory data stream except that the source memory width is 
excluded (format is described in Table 13-2, page 13-7). 

Table 1 3-1 lists the pin values on IIOF(3-1 ), that select from which location 
the source program will be loaded. 

Table 13-1. Boot Loader Mode Selection Using Pins IIOF(3-1) 



External Pin 


Source Program Location 


II0F3 


II0F2 


II0F1 


1 


1 


0 


Load source program from address 0030 OOOOh 


1 


0 


1 


Load source program from address 4000 OOOOh 


1 


0 


0 


Load source program from address 6000 OOOOh 


0 


1 


1 


Load source program from address 8000 OOOOh 


0 


1 


0 


Load source program from address A000 OOOOh 


0 


0 


1 


Load source program from address C000 OOOOh 


0 


0 


0 


Reserved 


1 


1 


1 


Load source program from communication port 
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13.2.2 Boot Load Sequence 

A general sequence of events in boot loading a source program is as 
follows: 

1 ) Select the boot loader mode by resetting the processor while driving 
the on-chip ROM enable pin (ROMEN) high. The status of external pins 
IIOF(3~1) indicates where to find the source program to be loaded 
(memory or communication port). These options are listed in 
Table 13-1. (Pins IIOF(3-1) are read as the IIOF flags in the CPU 
IIF register (described in Table 3-6 on page 3-13).) 

2) The boot loader takes the following steps to determine the source pro- 
gram's location: 

a) If an IIF(3— 1) value of 1102 to 001 2 (6 to 1) is found, the source 
program is loaded from the corresponding memory address shown 
in the top six lines of Table 1 3-1 . 

b) If an IIF(3— 1 ) value of OOO2 (0) is found, the boot program is exited. 

c) If none of the combinations OOO2 — IIO2 are found, the boot loader 
program assumes loading will be via a communication port, and it 
starts checking communication port input channels (in the order 
port 0 through port 5). If no input is found from a communication 
port, the program returns to checking the status of the IIOF(3-1) 
pins again. 

3) When the source program's data stream is found, the program is loaded 
at the address found in the fifth word of the data stream (format shown 
in Table 13-2) using the bus width specified in the first word (8, 16, or 
32 bits wide).The first five words of the source program specify its 
loading and execution criteria. Remaining words are the source 
program(s) and vector table pointers as shown in Table 13-2: 

4) An IACK instruction is exectued. The IACK indicates the completion of 
the boot load sequence. 

5) The source program is then executed (entry point is the first word of the 
first loaded program). 

The data stream with its source program(s) should be in the format shown 
in Table 13-2. The contents of words 4 through n vary for the different 
source programs loaded throughout the entire data stream .The first three 
words and the last three words are nonvariables that affect each of the 
source-program blocks. The eight least significant bits of the first word 
specify 
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Table 13-2. Structure of Source Program Data Stream 



Word 


Contents 


1 


Memory width where source program resides (8, 16, or 32 bits wide) 


2 


Value to set in the global memory interface control register (shown in Figure 7—2, 
page 7-7, and Table 7-3). 


q 


Value to set in the local memory interface control register (shown in Figure 7-2, 
page 7-7, and Table 7-3). 


WW 






••^^s't^'^^'the.lspurce program is-lp^dpd.',;; 1 '.,,, \V\ , v\> \ \\ SXKSS 




MS., "i,**!!?. " '""""iSr". f%' ^*Sf* :! :s. 'Hi. Mli,, "!|. "|| t 'i| t( . 'Ity '"jl, '«|(, '!:{. '!:> ".\. %■ % '\ "((• "It 




Last word of source program (the program or^^^^ 


n+1 


Word of all zeroes. (Note that if several source-program blocks were sent, word 
n above would be the last word of the /asfsource-program block. Each source-pro- 
gram block would have the format shown in words 4 through n (shaded above). 
Then this word of all zeroes follows the last source program block). 


n+2 


IVTP value (interrupt vector table pointer, see Section 3.2 on page 3-15). 


m3 


TVTP value (trap vector table pointer, see Section 3.2 on page 3-15). 


m-4 


Memory location for IACK instruction (see IACK instruction in Chapter 11). 



the memory width. If byte or half-word wide is selected, the loading se- 
quence is from LSBs to MSBs. 

Each source program in a multiple block program transfer can be loaded 
to different specified destinations. Each program block specifies its own 
block size and destination address at the beginning of the block. End the en- 
tire block program loader function by appending an all-zero word 
(0x0000 OOOOh) to the last block (only). 

The second and third last words of the source memory define the interrupt 
vector table pointer (IVTP) and the trap vector table pointer (TVTP). The last 
word of the source memory defines the memory loca tion fo r the IACK in- 
struction. Since the IACK instruction brings down the IACK signal as data 
is read, the memory location specified in the IACK instruction has to be in 
external memory that is available in the system in order to bring the IACK 
signal low. Then the processor begins execution of the first code block. 



13-7 



Boot Loader Description and External ROM Interfacing 



It is assumed that at feast one block of source code will be loaded when 
the loader is invoked. Initial loader invocation with a block size of 
0x00000000 produces unpredictable results. 



13.2.3 Examples of External Memory Loads 

Example 13-1, Example 13-2, and Example 13-3 respectively show 
memory images for memory configured as byte wide, 1 6-bit wide, and 32-bit 
wide. These examples assume that: 

□ The status of the IIOF(3-1) pins is 1102 after reset is deasserted 
(memory load from 0x030 OOOOh — see Table 13-1 on page 13-5). 

□ The source program resides at memory location 0x030 OOOOh and 
defines the following: 

■ Memory width for boot loader: 8, 1 6, or 32 bits 

■ Global bus memory that requires one softwar e wait state, extern al 
RDY (SWW = 11), page size = 64K for both ST RBO and ST RB1, 
and active address range = 1 G for both STRBO and STRB1 . 

■ Local memory bus that requires two software wait states (SWW = 
01), p age s ize = 3 2K, and active address range = 1 G for both 
STRBO and STRB1. 

■ First block program with 294 words in length and destination ad- 
dress at 0x002F F840h. 

■ Second block program with 64 words in length and destination ad- 
dress at 0x002F F800h. 

■ IVTP and TVTP, which are overlapped and point to the beginning 
of the on-chip RAM. 

■ Memory location of 0x30 OOOOh for IACK instruction. 

13.2.4 Communication Port Loading 

A value of all ones on IIOF(3-1 ) signals that the source program is being 
transmitted via a communication port. Bringing all three of the IIOF(3-1) 
pins high also allows the pins to be used as interrupt lines without any exter- 
nal decode logic. With pins IIOF(3-1 ) all high at reset, the 'C40 determines 
which channel contains the program by polling the input level of each port. 
The input data sequence of the communication boot loader is the same as 
that of the memory boot loader except for the source memory width defini- 
tion (because the memory width is fixed on the communication port boot 
loader). 
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13.2.5 External ROM Interfacing to the TMS320C40 

When the 'C40's ROMEN input pin is high and RESETLOC(1 ,0)=002 during 
reset, the memory boot loader can load programs stored in off-chip ROM 
to any valid external or internal memory in the 'C40's memory map. 



Because address zero (0) is reserved for the boot load, address zero 
should not be used for the reset vector when a user-defined, internal 
ROM-code mask is used. 



Regardless of what width ROM is used (byte-wide, 1 6-, or 32-bit wide), the 
8 LSBs of the first word read of the data stream specify the memory width. 
As shown in the three data stream examples starting with Example 1 3-1 on 
page 13-10, the first byte for each memory width is: 

□ 8-bit memories: 08h 

□ 1 6-bit memories: 001 Oh 

□ 32-bit memories: 00000020h 

If 8- or 1 6-bit ROMs are used, the loading sequence is from LSBs to MSBs. 
The boot loader reads the contents of 1 6-bit wide memories (least signifi- 
cant half word first) and packs each pair of 1 6-bit half words to make a 32- 
bit word before loading each word to memory. Accordingly, the boot loader 
reads the contents of byte-wide memories (least significant byte first) and 
packs each group of four bytes into a 32-bit word before loading each word 
to memory. Since the boot loader does byte packing before loading, no ex- 
ternal hardware is needed to pack the loaded bytes into a 32-bit word. For 
32-bit wide ROMs, no byte packing is necessary, because the ROM data 
width matches that of the 'C40. 

For 1 6-bit ROMs, the data read is expected to be in bit positions zero 
through fifteen. Thus, the half-word ROM's data lines should be interfaced 
to 'C40 data lines (L)D1 5-0. For byte-wide ROMs, the data read is expected 
to be in bit positions zero through seven. Hence, the byte-wide ROM's data 
lines should be interfaced to 'C40 data lines (L) D7-0. Even though the 'C40 
does not require that unused data lines be pulled up to Vco it is recom- 
mended that each unused data line be pulled up through separate 22-kilohm 
resistors to 5 volts for minimum power dissipation. 
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Example 13-1. Byte-Wide Configured Memory 



Word 


Address 


Value 


Comments 


1 


0300000h 


08h 


Memory width = 8 bits 


0300001 h 


OOh 


0300002h 


OOh 


0300003h 


OOh 


2 


0300004h 


FOh 


Global memory bus control word = 1 D7BC9F0h 
(Described in Figure 7-2 on page 7-7.) 


0300005h 


C9h 


0300006h 


7Bh 


0300007h 


1Dh 


3 


0300008h 


50h 


Local memory bus control word - 1 D739250h 
(Described in Figure 7-2 on page 7-7.) 


0300009h 


92h 


030000Ah 


73h 


030000Bh 


1Dh 




>:f)3ooooch, 
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: :!!;»03000l2h 


2Fh 




OOh : 


\\\\ 

it 


isSskvi 

; X 

1 ;b3004ABh 


,Ss' :i - 

•liii %n. \,. 


1st 1 source Droaram block starts here (first word) i \\u\ 

"<i„; "I..,,; • * \ vV 1 : ; <\ : ; w ■ \ w" , :\ J > 

v\Vy: SmWSSy.X ^kS^n^ 

; ,„". ,,'„'■,'. , ,',„>. 

1 st source program block ends here (last word) S , 



Note: Shaded area identifies source program block. 



Example concluded on next page 
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Example 13-1. Byte-Wide Configured Memory (Concluded) 



Word 


Address 


Value 


Comments 


"'. ; ; "; 

/ / / / , 

; ^op/,-' 1 ; 

, 


, O^PMCfo / 






2nd source program blpdk size * 40 






/QOjfr/' 




• .O^QO^Apn, 












*' /* /' /' 

* A 

i 1 .1' ii' ii' .i' . 

; 

,/ ; "„ 


:p30o4Boh : 


OOh 




2nd source program block, stariing addr = 2FF800h 








;;030OS4B2h<. 






. ; Q3Q04B3h ' 






' .!''' ■I 1 '' il 1 '' .i 1 '' 

„ 

' 

/ 

< "' " '„, 


';;;;;^3pp^^;:/;: 

' ,»**' jt 1 ' .1'' i"'' .I*' ,! ,! 


I''''' I''''' l' 




2nd source program' block starts here (first word) 

',!/! / 1 , ■,• ■ i . t - f '„/ , '•'// '/y //•,,',, „, • , , ; • • „ '„! / ■;'„• • 


366 


03005B4h 


OOh 


Value 0 to terminate the program block load 


03005B5h 


OOh 


03005B6h 


OOh 


03005B7h 


OOh 


367 


03005B8h 


OOh 


IVTP = 002FF800h 


03005B9h 


F8h 


03005BAh 


2Fh 


03005BBh 


OOh 


368 


03005BCh 


OOh 


TVTP = 002FF800h 


03005BDh 


F8h 


03005BEh 


2Fh 


03005BFh 


OOh 


369 


03005C0h 


OOh 


Memory location for IACK instruction = 30 OOOOh 
(This is the final word in the data stream.) 


03005C1h 


OOh 


03005C2h 


30h 


03005C3h 


OOh 



Note: Shaded area identifies source program block. 



13-11 



Boot Loader Description and External ROM Interfacing 



Example 13-2. 16-Bits-Wide Configured Memory 



Word 


Address 


Value 


Comments 


1 


0300000h 


001 Oh 


Memory width = 16 bits 


0300001 h 


OOOOh 


2 


0300002h 


C9F0h 


Global memory bus control word = 1 D7BC9F0h 


0300003h 


1D7Bh 


3 


0300004h 


9250h 


Local memory bus control word - 1 D739250h 


0300005h 


1D73h 


;>;;>x;;'^ 

' \\\\ 

\\ t . ' 


;\;®30p00,6h;\:;' 




■.Isiprogram block size * 126h ',*.,'• *• >'.'>','" \ ""Si! 
' \ i ' 1 , ;\ \, \' [ \\ ■ ■ ,' '!""/.,''' " , "'' ' ','•<';.', 1 ■ !," ■ , "i " 1 !, \ \ '•■ 


\::;^qp0O7'ti\:' 




n 

.„"•-.,, ?\ 

: : 






Cl^t jpfografiri blpek starting addr == 2FF84ph ■ J\J„ s \ \\', : 

h. >\. t|. 'i. ':. 'i. 1:. 'i. ''i. I;. li. '|. 1;. >i. h. 'i. 'i. 'i. li. 'i. j. 'i. I|. 'h. 'i. I|. 'ii. 'li. 'h. 


''\;;p3qQ,0O9:h.;;\, 


/OQ^Erh' 


\ \ v v\ ' 

C; N" *\> 

\*§\\ 

. \ X X X 'i 

,\; s, 


•;;;d3dbooAb 

I., ■ ■ 

■„ 

\, ' ' ' ' % . " ' , 

X M l| ( . >!,_ 'H, '(, '1, 

S!©3Q0255ri\ 


s .,v [• „ 

\ '' ]\ 

s '"' ' l\, ' '• 


1st program block starts here (first word) _ !' 
1st program block ends here (last word) 


/ / / / / 


,/.,03OO256h '/ 


,0040h ,. 


2nd program block size = 40h - 1 ■ '. | . t . . \> ti . \<> / / 1 < 




'OOOOh" 


//• 

; " ; ;/ 


;;;/Q30O258K,> 


F800h 


2nd program block starting affr = 2FF800h 


•/'6300259H/',, 


002Fh' 


„ v. 

' 3P2 

..'frrpygjx 
p0s' s 

' ,„"'' „< 

<////< 


j' 1 1" i' i" 1 

p30025Ah ", 

/' / /' .i^ 1 ' i* 11 ' •/ / 

'./ ^ ««' / . 

ji'' jl 1 ' .j! 1 ' .,)'• .,!«' / y 

/ ,/ / J' y ... 
f / : 'f'/\/ f / f 

/.;63bo2P9h' : 


''/'/'/'/ 

'/"/] V 

, /' / 

•" '<;v ", 


2nd program block starts here (first word) 

'/'^ •'" ' , • ' _ ' , ' ■ / / 

2nd program block ends here (last word) 


;///y 


;,.'03OO2DAh' , 


OOQOh 


Value 0 to terminate the program block road 


;/P3002DBh 


ooqoh 


367 


03002DCh 


F800h 


IVTP = 002FF800h 


03002DDh 


002Fh 


368 


03002DEh 


F800h 


TVTP = 002FF800h 


03002DFh 


002Fh 


369 


03002EOh 


OOOOh 


Memory location for IACK instruction = 30 OOOOh 
(This is the final word in the data stream.) 


03002E1H 


0030h 



Note: Shaded areas identify source program blocks. 
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Example 13-3. 32-Bits-Wide Configured Memory 



Word 


Address 


Value 


Comments 


1 


0300000h 


00000020h 


Memory width = 32 bits 


2 


0300001 h 


1D7BC9F0h 


Global memory bus control word = 01 D7BC9F0h 


\ '''''i h \ \ 

: '"^ , ' 


,N030bO,02h:', 


\;i^7^25bjt\ 


Local memoiy bus control word » 0lD73925phSS'\! 


; ; x ; 


'•.:0300003h' 


Q0dQ0i26h 


vi-Gt^^ 'X 






,;b02FF84ph\ 


X 


I £99,,; 

\v\ • 

\\\, , 


:;; 0300005^;.; 

( 'i \ \ \ X 

,,, ' ■»-,", ™, \ , 

, , \. V '''1,. '"'I,. ' ! ' 

;;p3001.^Att-,: 


'\ X ''\ X X, ''''i '''''i, ' ; 

' h \ \\\S>\ h: 

X X X X X X X X 

,, \^ \,\,,x^x 

x\\\\\' ,\, 

\, ' ' , \ X ' 

\ V \ 'x \ \ 

X X \ X \ X X 
, X X X X X X X ' 


^tstprogranrM^ 

\ |' '\'''<m}\ ,' ",' \\\\\XX, l X l ,> ' ' ' ' 

tli \ \ \ \ \\ , \ x '"• "\ \ \ x x \\ \ \ ., 

''x>\'^ xx*x^^ x ,h, ^\ 

Xi ^ ,, 


■^/^^% 




bpobdo40h^ 


yirid'prp(^^ 


'■' it 1 '' 1 y' / ..I'' 1 ,.• 

,/ Pfi'/y 


O30O12ChV 


002FF800h 


^n^prpgf^nii^^^ 


/ ./ ./ 



■ 



./ / 


; 03001 2Dh; 

,/' ,/ y y y y y .. 

' /" • ,/ ,: 

' ////'' 

;„/ ; ;„• ;,,/;„ 


y'' y' y' 1 y'' y .y' 
''''' yy /''/'''/''' y'y { 


;2n^ program blopk sta 
2nd program block ends here (last word) 


366 


03001 6Dh 


00000000h 


Value 0 to terminate the program block load 


367 


03001 6Eh 


002FF800h 


IVTP = 002FF800h 


368 


03001 6Fh 


002FF800h 


TVTP = 002FF800h 


369 


03001 70h 


00300000h 


Address location for IACK instruction = 00300000h 



Note: Shaded areas identify source program blocks. 



13-13 



Boot Loader Description and External ROM Interfacing 

13.2.6 II0F(3-1) Pin Loading 

The load options are based upon the status of IIOF(3-1) as general-pur- 
pose input pins. Therefore, in order to select the correct boot loader mode, 
pins I IOF(3-1 ) must be kept at a constant valid status value for a certain time 
period (values listed in Table 13-1 on page 13-5). See the 'C40 boot load- 
er program for detailed information — Figure 1 3-4 starting on page 1 3-1 4. 



Figure 13-3. 



After the boot load is complete, the IACK signal is brought down for one 
cycle. Figure 1 3-3 shows an example circuit that generates the IIOF(3— 1 ) 
signals for boot load selection and also allows incoming external interrupts 
during normal mo de of operation. In this example, after reset, the IIOF pins 
stay low until the IACK signal is received. 

Circuit for Generation of a Low IIOF signal for Boot Loader Selection 
+ 5V 



22K 



(from 'C40) IACK 



74S174 




External 
Interrupt' 



IIOFn 

(n = 1, 2, or 3) 
TMS320C40 



RESET 



13.2.7 TMS320C40 Boot Loader Source Program 

Figure 13-4. Boot Loader Source Program 



**************************************^ 
* 
* 



C40BOOT 



* 

*NOTE : 

* 

* 

* 



TMS320C40 BOOT LOADER PROGRAM 

(C) COPYRIGHT TEXAS INSTRUMENTS INC., 1990 



AFTER DEVICE RESET, THE PROGRAM IS CHECKING 
THE INPUT STATUS OF IIOF1-3 PINS AND COMMUNI- 
CATION PORT INPUT FLAGS TO CONFIGURE ITSELF 
WHEN ON CHIP ROM IS ENABLED (ROMEN=l ) . THE IIOF0 
PIN IS ASSUMED TO BE PULLED HIGH. 

THE FUNCTION SELECTION OF IIOF1-3 IS LISTED AS: 



* 


IIOF3 


IIOF2 


IIOF1 






FUNCTION 




* 


1 


1 


0 


Memory 


boot 


loader from 


00300000H 


* 


1 


0 


1 


Memory 


boot 


loader from 


40000000H 


* 


1 


0 


0 


Memory 


boot 


loader from 


60000000H 
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Figure 13-4. Boot Loader Source Program (Continued) 



0 


1 


1 


Memory boot loader from 80000000H 


0 


1 


0 


Memory boot loader from A0000000H 


0 


0 


1 


Memory boot loader from C0000000H 


0 


0 


0 


Reserved 


1 


1 


1 


Communication Port boot loader 



THE PROGRAM ASSUMES THE COMMUNICATION PORT BOOT 
LOADER IS THE DEFAULT FUNCTION. IF NO OTHER 
FUNCTION IS SELECTED, THE PROGRAM STARTS CHECKING 
THE COMMUNICATION PORT INPUT CHANNELS. IF THERE IS 
NO INPUT FROM A COMMUNICATION PORT, THE PROGRAM 
RECHECKS THE IIOF(3-l) STATUS AGAIN. 

MEMORY BOOT LOADER LOADS WORD, HALF-WORD, OR BYTE 
WIDE PROGRAM TO DIFFERENT SPECIFIED LOCATIONS. THE 8 
LSBs OF THE FIRST MEMORY SPECIFIES THE MEMORY WIDTH. 
IF THE HALF-WORD OR BYTE WIDE PROGRAM IS SELECTED, 
THE LSBs ARE LOADED FIRST AND THEN THE MSBs . THE NEXT 
2 WORDS CONTAIN THE CONTROL WORD FOR THE GLOBAL AND 
LOCAL MEMORY INTERFACE CONTROL REGISTERS. NEXT COME 
THE PROGRAM BLOCKS. THE FIRST TWO WORDS OF EACH 
PROGRAM BLOCK CONTAIN THE BLOCK SIZE AND DESTINATION 
ADDRESS WHERE THE PROGRAM IS TO BE LOADED. WHEN THE 
ZERO BLOCK SIZE IS READ, THE PROGRAM BLOCK LOADING 
IS TERMINATED. THE NEXT TWO WORDS ARE THE 
INITIAL VALUES FOR THE IVTP AND TVTP REGISTERS. 
AFTER THE BOOT LOADING IS COMPLETED, THE IACK SIG- 
NAL WILL BE SENT OUT ACCORDING TO THE LAST WORD OF THE 
SOURCE MEMORY, AND THE PROGRAM COUNTER WILL 
BRANCH TO THE STARTING ADDRESS OF THE FIRST 
PROGRAM BLOCK. 



IF THE IIOF(3-l) ARE SETUP FOR COMMUNICATION PORT 
BOOTLOADER, THE PROCESSOR WILL WAIT FOR THE FIRST 
INPUT FROM AN INPUT COMMUNICATION CHANNEL AND USE 
THAT CHANNEL TO PERFORM THE DOWNLOAD. THE BEGIN- 
NING TWO WORDS SHOULD CONTAIN THE GLOBAL AND LOCAL 
BUS CONTROL WORDS. SIMILAR TO THE MEMORY LOADER, 
PROGRAM CAN BE LOADED INTO DIFFERENT MEMORY 
BLOCKS. FIRST TWO WORD OF EACH PROGRAM BLOCK CON- 
TAIN BLOCK SIZE AND MEMORY ADDRESS TO BE LOADED 
INTO. WHEN THE ZERO BLOCK SIZE IS READ, THE PRO- 
GRAM BLOCK LOADING IS TERMINATED. IN OTHER WORDS, 
IN ORDER TO TERMINATE THE PROGRAM BLOCK LOADING, A 
ZERO HAS TO BE ADDED AT THE END OF PROGRAM BLOCK. 
THE FOLLOWING TWO WORDS ARE THE INITIAL VALUES FOR 
THE IVTP AND TVTP REGISTERS. AFTER THE BOOT LOAD- 
ING IS COMPLETED, THE IACK SIGNAL WILL BE SENT OUT 
ACCORDING TO THE LAST WORD OF THE SOURCE MEMORY, 
AND THE PROGRAM COUNTER WILL BRANCH TO THE START- 
ING ADDRESS OF THE FIRST PROGRAM BLOCK. 
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Figure 13-4. Boot Loader Source Program (Continued) 



.page 

************************************************************ 

* RESET VECTOR * 
************************************* 

.sect "vectors" 
RESET .word START ; On hardware RESET go to START 

************************************************************ 

* TMS320C40 PROCESSOR BOOT LOADER * 
************************************************************ 

.text 



START: CMPI 04440H, IIF 
BEQ LIFETEST 
LDHI 0010H,AR0 



Test IIOFO pin conditiom 
If low, execute life test 
Load peripheral mem. map start 
addr 100000H 

LDHI 002FH / SP ; Initialize stack pointer SP to 

OR 0FFF0H,SP / internal RAM address 2FFFF0H 
LDI 0,R0 ; Set start address flag off 

LDI COM_JLOAD, RIO ; Comm. port load subroutine 

address -> RIO 



CHECK THE IIOF1-3 FOR THE BOOT LOADER 



LDHI 
CMPI 
BEQ 


0030H, AR1 
04404H f IIF 
MEMORY 


LDHI 
CMPI 
BEQ 


04000H, AR1 
04044H, IIF 
MEMORY 


LDHI 
CMPI 
BEQ 


06000H, 
04004H, 
MEMORY 


AR1 
IIF 


LDHI 
CMPI 
BEQ 


08000H, 
00444H, 
MEMORY 


AR1 
IIF 


LDHI 
CMPI 
BEQ 


0A000H, 
Q0404H, 
MEMORY 


AR1 
IIF 


LDHI 
CMPI 
BEQ 


0C000H, 
00044H, 
MEMORY 


AR1 
IIF 


CMPI 
BEQ 


00004H, IIF 
RESERVED 



Load memory address = 00300000H 
Test function 110 condition 
If true, execute memory boot 
loader 

Load memory address = 40000000H 
Test function 101 condition 
If true, execute memory boot 
loader 

Load memory address = 60000000H 
Test function 100 condition 
If true, execute memory boot 
loader 

Load memory address = 80000000H 
Test function Oil condition 
If true, execute memory boot 
loader 

Load memory address = A0000000H 
Test function 010 condition 
If true, execute memory boot 
loader 

Load memory address = C0000000H 
Test function 001 condition 
If true, execute memory boot 
loader 

Test function 000 condition 
If true, branch to reserve 
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Figure 13-4. Boot Loader Source Program (Continued) 



COMMUNICATION PORT BOOT LOADER 



CHECK COMMUNICATION PORT INPUT CHANNEL 



ADD I 040H / AR0 / AR3 



CHECK CH: 



LDI 

LSH3 
BNZ 

ADD I 

DBU 



5, AR1 

-9,*AR3,R1 
LOAD1 

01 OH, AR3 

AR1 , CHECK__CH 

CHECK 



Point to comm. port 0 

control register addr 
Set loop counter for 

CHECK_CH loop 
Check comm port input 
If input exist, start comm 

port loader 
Point to next comm. port 

channel addr 
Check next comm. port 

channel input 
Recheck the input flags 



MEMORY BOOT LOADER 



TEST MEMORY WORD WIDTH 



MEMORY : LDI *AR1++ ( 1 ) , Rl 
LDI W WIDE, RIO 



LSH 
BN 



NOP 

LDI 

LSH 
BN 

LDI 
ADD I 



26, Rl 
LOAD0 



*AR1++(1) 
H__WIDE,R10 

1, R1 

LOAD0 

B_WIDE,R10 

2, AR1 



START PROGRAM LOADING 



LOAD0: CALLU RIO 



STI AR2,*AR0 

CALLU RIO 

STI AR2,*+AR0(4) 



Load the memory word width 
Full-word size subroutine 
address -> RIO 
Test bit5 of mem. width word 
If '1' start PGM loading 
(32 bits width) 

Jump last half word from 

mem. word 
Half-word size subroutine 
address -> RIO 
Test bit 4 of mem. width word 
If '1' start PGM loading 
(16 bits width) 

Byte size subroutine address 

-> RIO 
Jump last 2 bytes from 

mem. word 



Load new word according to 

mem. width 
Set global bus control 

register 
Load new word according to 

mem. width 
Set local bus control 

register 
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Figure 13-4. Boot Loader Source Program (Continued) 



LOAD 2 : CALLU RIO 

SUB 1 3 1,AR2,RC 



CMP I 
BEQ 



-1,RC 
IVTP LOAD 



Load new word according to 

mem. width 
Set block size for 

repeat loop 
If 0 block size start PGM 



CALLU RIO 



LDI 
LDI 

LDIZ 

LDI 

SUBI 



AR2 , ARO 
R0,R0 

AR2,R9 

-1,R0 

1,R10 



CALLU RIO 



LDI 
ADD I 
B 



1,R0 
1,R10 

LOAD 2 



INITIALIZE IVTP AND TVTP REGISTERS 



I VTP _LOAD : CALLU Rl 0 

LDPE AR2,IVTP 

TVTP JLOAD : CALLU Rl 0 

LDPE AR2 , TVTP 

CALLU RIO 

IACK *AR2 



/ Load new word according to 
mem. width 

; Set destination address 
Test start address loaded 
flag 

Load start address if flag 
off 

Set start & dest. address 

flag on 
Sub address with loop 

Load block words according 

to mem. width 
Set dest. address flag off 
Sub address without loop 
Jump to load a new block 

when loop completed 



BU 



R9 



? Load new word according to 
mem. width 

; Load the IVTP pointer 

? Load new word according to 

mem. width 
; Load the TVTP pointer 
; Load new word according to 

mem. width 
; Send out IACK signal out 

Branch to start of program 



************************************* 

* BYTE-WIDE MEMORY BOOT LOADER SUBROUTINE * 

************************************************************ 

LOOP__B: 
B WIDE: 



LOAD__B : 
* 

B END : 



RPTB 


LOAD B 


PGM load loop 


LWLO 


*AR1++(1) ,AR2 ; 


Load byte 0 (LSB) 


NOP 






LWL1 


*AR1++(1) ,AR2 ; 


Join byte 1 with byte 0 


NOP 






LWL2 


*AR1++(1) ,AR2 ; 


Join byte 2 with byte 0 & 1 


NOP 






LWL3 


*AR1++(1) ,AR2 ; 


Join byte 3 with byte 0, 1, 
& 2 


LDI 


R0,R0 


Test load address flag 


BNN 


B END 




STI 


AR2,*AR0++(1) ; 


Store new word to dest . 






address 


RETSU 




Return from subroutine 
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Figure 13-4. Boot Loader Source Program (Concluded) 



************************************************************ 

* HALF-WORD WIDE MEMORY BOOT LOADER SUBROUTINE * 
************************************************************ 

LOOP_H: RPTB LOADJH ; PGM load loop 

H WIDE: LWLO *AR1++ ( 1 ) , AR2 / Load LSB half-word 



LWLO 
NOP 
LWL2 

LDI 
BNN 



*AR1++ (1) , AR2 / Join MSB half-word with 

LSB half-word 
R0,R0 ; Test load address flag 

H END 



LOAD__H 
* 

H END 



STI AR2, *AR0++ (1) ; Store new word to dest . 

address 

RETSU ; Return from subroutine 



************************************************************ 

* FULL-WORD WIDE MEMORY BOOT LOADER SUBROUTINE * 
************************************************************ 

LOOPJtf 
W WIDE 



LOADJtf 
W END 



RPTB 
LDI 
LDI 
BNN 
STI 

RETSU 



LOAD_W 

*AR1++(1) ,AR2 ; 
R0,R0 
W_END 

AR2, *AR0++ (1) ; 



PGM load loop 
Read a new 32 bits word 
Test load address flag 

Store new word to dest . 

address 
Return from subroutine 



******************************** * * ************************** 

* COMMUNICATION PORT BOOT LOADER SUBROUTINE * 

************************************************************ 



LOOP C 



RPTB LOAD C 



COM_LOAD LSH3 
BZ 
LDI 
LDI 
BNN 
STI 



LOAD_C 
* 

C_END 
RESERVED : 



RETSU 



. end 



-9, *AR3,R1 
COM_LOAD 
*+AR3 (1) , AR2 
R0,R0 
C_END 

AR2, *AR0++ (1) 



PGM load loop 
Check comm port input 
Wait for comm port input 
Read a new 32 bits word 
Test load address flag 



Store new word to dest. 

address 
Return from subroutine 
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13.3 Global and Local Bus Interface 

The 'C40 uses the global and local buses to access the majority of its 
memory-mapped locations. Since these two memory interfaces are identi- 
cal in every way, except for their positions in the memory map, each exam- 
ple in this memory interface section focuses on only one of the two inter- 
faces. However, all of the examples are applicable to eitherthe local orglob- 
al bus. Additionally, each of the buses features two identical, mutually exclu- 
sive sets of control signals: 



Global 


Local 


Bus 


Bus 




STRBO 


LSTRBO 


STRB1 


LSTRB1 


CEO 


LCEO 


CE1 


LCE1 


RDYO 


LRDYO 


RDY1 


LRDY1 



Also, AE and DE put the global bus in high impedance, and LAE and LDE 
put the local bus in high impedance. 

Although both the global and the local buses can interface to a wide variety 
of devices, the devices most commonly interfaced are memories. There- 
fore, memory interface examples are used in this section. 

13.3.1 Zero Wait-State Interface to RAMs 

For a full-speed, zero wait-state interface to any device, a 50-MHz 'C40 
(40-ns instruction cycle time) requires a read access time of 21 -ns from 
address stable to data valid. For most memories, the access time from chip 
enable is the same as access time from address; thus, it is possible to use 
20-ns memories at full speed with a 50-MHz 'C40. However, to properly use 
20-ns memories, there can be no long delays between the processor and 
the memories. Avoiding these delays is not always possible in practice, 
because of interconnection delays and the fact that gating is sometimes 
required for chip enable generation. In addition, if a memory device with an 
output enable is chosen, output enable must become active soon enough 
to ensure that the memory can meet the data valid timing requirements of 
the 'C40. For memories with 20-ns access times, the output enable active 
to data valid timing parameter is typically less than 10 ns. 

Currently available RAMs without output enable (OE) control lines include 
the 1 -bit wide organized RAMs and most of the 4-bit wide RAMs. Those with 
OE controls include the byte-wide and a few of the 4-bit wide RAMs. Many 
of the fastest RAMs do not provide OE control; they use chip-enable (CE) 
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controlled write cycles to ensure that data outputs do not turn on for write 
operations. In CE-controlled write cycles, the write control line (WE) goes 
low before CE goes low, and internal logic holds the outputs disabled until 
the cycle is completed. Using CE-controlled write cycles is an efficient way 
to interface fast RAMs without OE controls to the 'C40 at full speed. 

13.3. 1. 1 RAM Interface - Using One Local Strobe 

Figure 1 3-5 shows the 'C40's local bus interfaced to the Integrated Device 
Technology™ IDT71 258 20-ns 64K x 4-bit CMOS static RAMs with zero wait 
states using chip enable-controlled write cycles. These RAMs are arranged 
to implement 64K, 32-bit words located at addresses OOOOOh thru OFFFFh 
(internal ROM is assumed to be disabled), which are the first 64K words in 
external memory. If these 64K words of SRAM are the only memory con- 
trolled by LSTRBO, the LSTRB ACTIVE field of the local memory interface 
con trol regi ster (LMICR) should be set to its minimum value 01 1 1 1 2, allow- 
ing LSTRBO to be active only for the first 64K words of the 'C40's memory 
space. (The memory interface control register and its various fields are 
shown in Figure 7-2 on p age 7-7). In additio n, because this memory is the 
only memory interfaced to LSTRBO, LSTRBO requires only one page. The 
PAGESIZE field of the LMICR should be set to 01 1 1 1 2 . Also note that in 
Figure 1 3-5, the LRDYO input is tied low, selecting zero wait states for all 
LSTRBO acc esses on t he local bus. With all of the zero-wait-state memory 
controlled by LSTRBO, LSTRB1 can be used to control accesses to slower 
read-only memory devices or other types of memory. 
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Figure 13-5. TMS320C40 Interface to Zero-Wait-State SRAM 
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In this circuit implementation, no external logic is necessary to interface the 
'C40 to the memory device. This gluel ess inter face is possible because 
changes in LR/W are always framed by LSTRB. For typical memory de- 
vices, it is necessary to hold the device inactive (CS inactive) during 
changes in WE; this avoids undesired mem ory acce sses while the address 
changes. The 'C40 ensures this by having LSTRB always frame changes 
in LR/W. (See Section 7.5 on page 7-17 for more information.) 

13.3.1.2 Consecutive Reads Followed by a Write Interface Timing 

Figure 1 3-6 shows the timi ng of consecutive reads followed by a write. For 
consecutive reads, LSTRBO stays active (low), and LR/W stays high as long 
as read cycles continue. The critical timing that must be met for 
back-to-back reads is the address-valid to data-valid time. The 'C40 re- 
quires zero-wait-state memories to have an address-valid to data-valid time 
of less than 21 -ns. This can be explained in more detail as: 

oneH1 cycle time -[(H1 low to address-valid time) + (data setup time before H1 low)] 

For most memory devices, this time is the same as the memory access time, 
which is ti = 20 ns. Thus, memories with access times of 25 ns or more 
cannot meet this timing. 

Memory device timing is not as critical for zero-wait-state as for nonzero- 
wait-state write cycl es, beca use of the two H1 cycle writes ofthe 'C40. The 
extra cycle gives LSTRBO enough time to frame LR/W, preventing 
memories that go into high impedance slowly at the end of a read cycle from 
driving the bus during the subsequent write cycle. For the memory device 
used in this design (Figure 1 3-6), the data lines are guaranteed to be three 
stated (t2 = 10 ns) after CS goes inactive, which gives more than 23 ns of 
margin be fore the 'C40 starts driving the bus with write data. Also, the extra 
cycle with LSTRBO inactive prevents writes to random locations in memory 
while the address is changing between consecutive writes. 

For the write cycles shown in Figure 13-6 and Figure 13-7, the RAM 
requires 15 ns of write data setup before CS goes high, and this design 
provides at least 24 ns (t3). A data hold time of 0 ns (t4) is required by the 
RAM, and this design provides greater than 1 3 ns^Finally, the RAM's setup 
and hold times for address (with respect to CS high) of 20 and 0 ns, 
respectively, are also met with a clear margin. 
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Figure 13-6. Consecutive Reads Followed by a Write 
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13.3. 1.3 Consecutive Writes Followed by a Read Interface Timing 

Figure 1 3-7 shows the timing of consecutive writes follo wed by a read. No- 
tice that between consecutive writes, LR/W stays low, but STRBO goes inac- 
tive to frame the write cycles. Although 'C40 zero-wait-state writes take two 
H1 cycles, internally (from the perspective of the CPU and DMA) writes ap- 
pear to take one cycle if no accesses to that interface is already in progress. 

In the read cycle following the wri tes in F igure 13-7, the 'C40 requires 
zero-wait-state memories to have a LSTRB a ctive to d ata-valid time of less 
than21 ns(oneH1 cycle time minus (H1 lowto LSTRB active time plus data 
setup time before H1 low)). For most memory devices, this time is the same 
as the memory access time, which is ti = 20 ns in this design. Thus, a margin 
of only 1 ns exists, leaving little time for STRB gating if desired. 

13.3. 1.4 RAM Interface Using Both Local Strobes 

Figure 13-8 shows the 'C40's local bus interfaced to IDT71258 RAMS — 
20-ns 64K x 4-bit CMOS static RAMs with zero wait states using CS con- 
trolled write cycles. These RAMs are arranged to allow 128K 32-bit words 
of local memory, which is implemented as two 64K x 32-bit banks. One bank 
is controlled by each of the two sets of control signals on the local bus. To 
map these memory devices properly in the , C40 , s memory space, you must 
use the local memory interface control register (LMICR) to define which part 
of the local bus's memory space is mapped to e ach of th e two strobes. In 
this implementation with internal ROM disabled, LSTRBO is mapped to the 
first 64K words of the local space — addresses Oh through OFFFFh, and 
LSTRB1 is mapped to the rest of the local space — addresses 10000h 
through 7FFF FFFFh. For this memory configuration, the LSTRB ACTIVE 
field of the local memo ry inte rface control register (LMICR) should be set 
to 011112- Also, each LSTRB requires only one page. The PAGESIZE field 
of the LMICR should be set to 01 1 1 1 2. Also, note that in Figure 1 3-8, the 
LRDY inputs are tied low, selecting zero wait states for all accesses on the 
local bus. 

Hence, through the use of the 'C40's four strobes (two each on the local and 
global buses), four different banks of memory can be decoded. In addition, 
the address decoding can be changed under program control by changing 
the LSTRB active field (bits24-28) of the LMICR or the global memory inter- 
face control register (GMICR). If more than four banks of memory must be 
decoded or if the chosen memory device cannot meet the read cycle timing 
requirements for the 'C40 at zero wait states, page switching (discussed in 
subsection 13.4.6 on page 13-32) should be used to add an extra cycle to 
read accesses outside the current bank boundary. 
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Figure 13-8. TMS320C40 Interface to Zero-Wait-State SRAMs, Two Strobes 
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13.4 Wait States and Ready Generation 

The use of wait states can greatly increase system flexibility and reduce 
hardware requirements over systems without wait-state capability. The 
'C40 has the capability of generating wait states on either the global bus or 
the local bus, and both buses have independent sets of ready control logic. 
The buses' wait-state configuration is determined by the SWW and WTCNT 
fields of the local and global bus interface control registers (see Section 7.4, 
page 7-15, for a detailed description of the wait-state options). 

This section discusses ready generation from the perspective of the global 
bus interface; however, wait-state operation on the local bus is the same as 
on the global bus, so this discussion pertains equally well to both (local and 
global). Also, t he loca l and g lobal buses e ach hav e two s ets of control sig- 
nals — R/WO, STRBO, RDYO, and R/W1 , STRB1 , RDY1 — with each set 
of control signals having its own ready signal, providing for more flexibility 
in support of external devices with different speeds. Since both strobes' 
ready signals share the same electrical characteristics, the following discus- 
sion focuses on one of the global bus's set of control signals. 

Wait states are generated on the basis of: 

□ the internal wait-state generator, 

□ the external ready inputs (RDYO or RDY1), or 

□ the logical AND or OR of the two (discussed in Section 7.4, page 7-1 5). 

When enabled, internally generated wait states affect all external cycles, 
regardless of the address accessed. If different nu mber s of wait states are 
required for various external devices, the external RDY input may be used 
to customize wait-state generation to specific system requirements. 

If either the logical OR or electrical AND (since the signals are true low) of 
the external and wait-count ready signals is selected, the earlier of the two 
signals will generate a ready condition and allow the cycle to be completed. 
It is not required that both signals be present. 

Note: STRBx SWW Field Values ' 

The STRBx SWW fields of the memory-interface control register are shown 
in Figure 7-2 (page 7-7) and explained in Table 7-7 (page 7-16). 
i , i 
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13.4.1 ORing of the Ready Signals (STRBx SWW = 10) 

The OR of the two ready signals can be used to implement wait states for 
devices that require a greater number of wait states than are implemented 
with internal logic (up to seven). This feature is useful, for example, if a sys- 
tem contains some fast and some slow devices. In this case: 

□ Fast devices can generate ready externally with a minimum of logic. 
When fast devices are accessed, the external hardware responds 
promptly with ready, which terminates the cycle. 

□ Slow devices can use the internal wait counter for larger numbers of 
wait states. When slow devices are accessed, the external hardware 
does not respond, and the cycle is appropriately terminated after the in- 
ternal wait count. 

The OR of the two ready signals may also be used if conditions occur that 
require termination of bus cycles before the number of wait states implem- 
ented with external logic. In this case, a shorter wait count is specified inter- 
nally than the number of wait states implemented with the external ready 
logic, and the bus cycle is terminated after the wait count. This feature may 
also be used as a safeguard against inadvertent accesses to nonexistent 
memory that would never respond with ready and would therefore lock up 
the '040. 

If the OR of the two ready signals is used, however, and the internal wait- 
state count is less than the number of wait states implemented externally, 
the external ready generation logic must have the ability to reset its 
sequencing to allow a new cycle to begin immediately following the end of 
the internal wait count. This requires that, under these conditions: 

□ consecutive cycles must be from independently decoded areas of 
memory (or from different pages in memory), and 

□ the external ready generation logic must be capable of restarting its 
sequence as soon as a new cycle begins. 

Otherwise, the external ready generation logic may lose synchronization 
with bus cycles and therefore generate improperly timed wait states. 

13.4.2 ANDing of the Ready Signals (STRBx SWW = 11) 

If the logical AND (electrical OR) of the wait count and external ready signals 
is selected, the later of the two signals will control the internal ready signal, 
but both signals must occur. Accordingly, external ready control must be im- 
plemented for each wait-state device, and the wait count ready signal must 
be enabled. 

This feature is useful if there are devices in a system that are equipped to 
provide a ready signal but cannot respond quickly enough to meet the 
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'C40's timing requirements. In particular, if these devices normally indicate 
a ready condition and, when accessed, respond with a wait until they be- 
come ready, the logical AND of the two ready signals can be used to save 
hardware in the system. In this case, the internal wait counter can provide 
wait states initially, and then the external ready can provide wait states after 
the external device has had time to send a not-ready indication. The internal 
wait counter then remains ready until the external device also becomes 
ready, which terminates the cycle. 

Additionally, the AND of the two ready signals may be used for extending 
the number of wait states for devices that already have external ready logic 
implemented but require additional wait states undercertain unique circum- 
stances. 

13.4.3 External Ready Generation 

In the implementation of external ready generation hardware, the particular 
technique employed depends heavily on the specific characteristics of the 
system. The optimum approach to ready generation varies, depending on 
the relative number of wait-state and nonwait-state devices in the system 
and on the maximum number of wait states required for any one device. The 
approaches discussed here are intended to be general enough for most 
applications and are easily modifiable to comprehend many different 
system configurations. 

In general, ready generation involves the following three functions: 

1 ) Segmentation of the address space in some fashion to distinguish fast 
and slow devices. 

2) Generation of properly timed ready indications. 

3) Logical ORing of all the separate ready timing signals together to 
connect to the physical ready input. 

Segmentation of the address space is required to obtain a unique indication 
of each particular area within the address space that requires wait states. 
This segmentation is commonly implemented in a system in the form of 
chip-select generation. Chip-select signals may be used to initiate wait 
states in many cases; however, occasionally, chip-select decoding 
considerations may provide signals that will not allow ready input timing 
requirements to be met. In this case, coarse address space segmentation 
may be made on the basis of a small number of address lines, where simpler 
gating allows signals to be generated more quickly. In either case, the signal 
indicating that a particular area of memory is being addressed is normally 
used to initiate the ready or wait-state signal. 

Once the region of address space being accessed has been established, 
a timing circuit of some sort is normally used to provide a ready indication 
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to the processor at the appropriate point in the cycle to satisfy each device's 
unique requirements. 

Finally, since indications of ready status from multiple devices are typically 
prese nt, the signals are logically ORed by using a single gate to drive the 
RDY input. 

13.4.4 Ready Control Logic 

One of two basic approaches may be taken in the implementation of ready 
control l ogic, depending upon the state of the ready input between ac- 
cesses. If RDY is low betwee n acc esses, the processor is always ready un- 
less a wait state is required; if RDY is high between accesses, the processor 
will always enter a wait state unless a ready indication is generated. 

If RDY is low between accesses, control of devices that are zero-wait- 
state at full speed is straightforward; no action is necessary, because ready 
is always active unless otherwise required. Devices requiring wait states, 
however, must drive ready high fast enough to meet the input timing require- 
ments. Then, after an appropriate delay, a ready indication must be gener- 
ated. This can be quite difficult in many circumstances because wait-state 
devices are inherently slow and often require complex select decoding. 

If RDY is high between accesses, zero-wait-state devices, which tend to 
be inherently fast, can usually respond immediately with a ready indication. 
Wait-state devices may simply delay their select signals appropriately to 
generate a ready. Typically, this approach results in the most efficient imple- 
mentation of ready control logic. Figure 1 3-9 shows a circuit of this type, 
which can be used to generate 0, 1 , or 2 wait states for multiple devices in 
a system. 
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Figure 13-9. Logic for Generation of 0, 1, or 2 Wait States for Multiple Devices 
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13.4.5 Example Circuit 

Figure 13-9 shows how a single, 7-ns 16R4 programmable logic device 
(PLD) can be used to generate 0, 1 , and 2 wait states for multiple devices 
that are interfaced to a TMS320C40. In this example, distinct address bits 
are used to select the different wait-state devices. Here, each of the three 
address lines input to the 1 6R4 corresponds to a different speed device. 
For a single 16R4 implementation, up to ten different address bits can be 
used to select different speed devices. 

The single output, 4Q, of the PLD is connected directly to the RDYO input 
of the TMS320C40 to signal the completion of a bus access when external 
wait-state generation is desired (see Section 7.4 on p age 7 -1 5 for more in- 
formation on TMS320C40 wait-state options). Since, RDYO is sampled on 
the falling of H1 , the H3 output clock is used as the PLD clock input. 

Figure 13-10 shows the state machine and equation for programming the 
16R4 PLD ready logic. The PLD language shown in this figure is ABEL. 
STRBO is an input into the PLD that indicates that a valid TMS320C40 bus 
cycle is occurring. RESET can also be used to bring the state machine back 
to the idle state. 

Notic e that the RDYO output of the PLD is not registered. An asynchronous 
RDYO signal is necessary to generate a ready signal for zero-wait-state de- 
vice s. Whe n a zero-wait-state devic e is se lected (ahil high in Figure 13-1 0 
and STRBO is low, the PLD asserts RDYO low within 7 ns. Hence, RDYO 
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goes active fast enough to satisfy the 20-ns setup time of RDYO low before 
H1 low. 

For generatio n of RD YO for one and two wait states, the device select ad- 
dress bits and STR BO are delayed one and two cycles, respectively, by the 
PLD before a RDYO is brought active low. The one H3-cycle delay required 
for one-wait-state device ready generation corresponds to state wait_one 
in Figure 1 3-1 0 and the two H3-cycle delay required for two-wait-state de- 
vices corresponds to state wait__twoa and wait_twob. 

This 1 6R4 PLD-based design can be used to implement different numbers 
of wait states for multiple devices. More devices can be selected with 
TMS320C40 address lines, and a higher number of wait states can be pro- 
duced with a PLD logic. Furthermore, this approach can be used in conjunc- 
tion with the TMS320C40's internal wait-state generator. 

13.4.6 Page Switching Techniques 

The , C40 , s programmable page switching feature can greatly ease system 
design when large amounts of memory or slow external peripheral devices 
are required. This feature can provide a time period for disabling all device 
selects that would not normally be present otherwise (refer to subsection 
7.3.2 on page 7-1 3 for further information regarding page switching). During 
this interval, slow devices are allowed time to turn off before other devices 
have the opportunity to drive the data bus, thus avoiding bus contention. 

When page switching is enabled, any time a portion of the h igh-o rder ad - 
dress lines changes, as defined by the contents of the STRBO and STRB1 
PAGESIZE fields (in th e globa l and local memory interface control regis- 
ters), the corresp onding STRB and PAGE go high for one full H1 cycle. Pro- 
vided that STRB is inclu ded in chip-select decodes, this causes all devices 
selected by that STRB to b e disab led during this period. The next page of 
devices is not enabled until STRB and PAGE go low again. 

If the high-order address lines remain constant during a read cycle, the 
memory access is the same as that of a memory access without page 
switching. In addition, page switching is not required during writes, because 
these cycle s exhib it an inherent one-half H 1 cycle setup of address informa- 
tion before STRB goes low. Thus, when you use page switching for read/ 
write devices, a minimum of half of one H1 cycle of address setup is pro- 
vided for all accesses outside a page boundary. Therefore, large amounts 
of memory can be implemented without wait states or extra hardware re- 
quired for isolation between pages. Also, note that access time for cycles 
during page switching is the same as that of cycles without page switching, 
and, accordingly, full-speed accesses may still be accomplished within each 
page. 
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The circuit shown in Figure 1 3-1 0 illustrates the use of page switching with 
the Cypress Semiconductor™ CY7B1 85 1 5-ns 8K x 8 BICMOS static RAM. 
This circuit implements 32K 32-bit words of memory with full-speed zero 
wait-state accesses within each page. 

Figure 13-10. State Machine and Equation for the 16R4 PLD 
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Figure 13-10. State Machine and Equation for the 16R4 PLD (Concluded) 
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Figure 13-11 Page Switching for the Cypress Semiconductor™ CY7C185 
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A 5-ns, '1 6L8 PLD decodes lines A1 5 - A1 3. These lines along with STRBO 
select each of the four pages in this circuit. With the PAGESIZE field of 
STRBO of the global memory interface control register set to OCh, the pages 
are selected on even 8K-word boundaries, starting at location zero in 
external memory space. 

This circuit cannot be implemented without page switching, because data 
output's turn-on and turn-off delays cause bus conflicts, and full-speed 
accesses do not allow enough time for chip-select decoding for the four 
pages. Here, the propagation delay of the 1 6L8 is involved only during page 
switches, where there is sufficient time between cycles to allow new chip-se- 
lects to be decoded. 

The timing of this circuit for read operations using page switching is shown 
in Figure 1 3-1 2. When a page switch occurs, the page add ress on address 
lines A30 - A13 is updated during the extra H1 cycle while STRBO is high. 
Then, after chip-select decodes have st abilized and the previously selected 
page has disabled its outputs, STRB goes low for the next read cycle. 
Further accesses occur at full speed with the normal bus timings, as long 
as another page switch is not necessary. Write cycles do not require page 
switching, because of the inherent address setup provided in their timings. 

This timing is summarized in Table 13-3. 
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Figure 13-12. Timing for Read Operations Using Bank Switching 
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Table 13-3. Page Switching Interface Timing 



Time 
Interval 


Event 


Time Period 


ti 


H1 falling to address/STRB valid 


7 ns 


t 2 


STRB to select delay 


5 ns 




Memory disable from select 


8 ns 


U 


H1 falling to STRB 


7 ns 




STRB to select delay 


5 ns 




Memory output enable delay 


3 ns 
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13.5 Parallel Processing Interfaces 

The 'C40 communication ports and support for shared memory are the keys 
to parallel processing design flexibility. Almost any number of processors 
can be linked together in a wide variety of configurations. In this section, 
Figure 13-14 (in three parts) illustrates 'C40 parallel processing 
configurations that are used to fulfill many signal processing system needs. 

13.5.1 Message Broadcasting From One TMS320C40 to Many 
TMS320C40's 

Message broadcasting from one 'C40 to many 'C40s requires a simple inter- 
face. The block diagram of one is shown in Figure 13-13. To simplify the 
interface, no token transferring is done. In this design, one 'C40 is the dedi- 
cated transmitter, and three 'C40s are dedicated receivers. No reset circuit- 
ry is needed because of the transmitter is communication port 0 and the re- 
ceivers are communication ports 3, 4, and 5. At reset, 'C40 communication 
ports 0, 1 , and 2 are output ports, and communication ports 3, 4, and 5, are 
input ports. Due to this fix ed comm un ication s configuration, no token trans- 
fer is needed, allowing the CREQ and CACK pins of all processor s to be indi- 
vidually pulled up to 5 volts through 22-kQ resistors. Also, the STRB pins 
of the communicating processors can be tied together along with the data 
lines CD7-0. However, if more than 5 r eceiver s must be driven by a single 
transmitter at the 'C40s rated speed, the STRB and CD7-0 lines need to be 
buffered. Since the 'C40 communication ports protocol is asynchronous, 
if the speed of broadcast is not critical, bu ffers are not needed as long as 
the number of receivers is less than 30. The CRDY sig nal in put by the trans- 
mitter communication port is generated by ORing the RDY outputs of all of 
the receiver communication ports. The transmitter should not receive a RDY 
signal until the receiver has received all data. 

In addition, to ensure that the dedicated receiver 'C40s do not try to arbitrate 
for the communication port bus, you should halt the output ports of the re- 
ceiver '0405 by setting bit four of their communication port control registers 
to one. 



13-37 



Parallel Processing Interfaces 



Figure 13-13. Message Broadcasting by One 'C40 to Many 'C40s 
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13.5.2 Shared Global Memory Interface With Fair Bus Arbitration 

One of the most common multiprocessing system configurations is memory 
shared by each processor in a system. Shared memory is typically 
implemented by tying the processors' data and address lines together. 
However, the shared memory interface must guarantee that no more than 
one processor is driving the shared bus at any one time; it must also allow 
all processors sharing the bus to have a chance to access shared 
resources. 

The 'C40 supports shared memory multiprocessing with its identical global 
and local port interfaces. Both interfaces have four status output signals, 
(L)STAT3-0, which identify what type of access is attempting to begin on 
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the bus. The se sign als identify whether th e 'C40 p ort is idle, a DMA read is 
occurring, a STRB1 write is occuring, a LOCKed access to memory is 
pending, etc. (as listed in Table 7-2, page 7-5). These signals can be 
interpreted to issue single access or locked access bus requests to a shared 
bus arbiter. 

To support sh ared addre ss control and data lines, the 'C40 provides the 
(L)CE, (L)AE, and (L)DE input signals. When disabled (made high), these 
signals three-state the control signals, address lines, and data lines, respec- 
tively, of the port. These bus enable lines are asynchronous inputs to the 
'C40, which can quickly turn off bus drivers when another processor is 
accessing a shared resource. However, these signals asynchronously turn 
off the , C40 , s local and global buses, without memory accesses being 
suspended. To en sure tha t data written is seen externally and data read is 
valid, the external (L)RDY s hould be be used for wait-state generation in 
shared memory designs. An (L)RDY signal should not be sent to the 'C40 
until the processor has regained access to the bus (CE, AE, DE enabled) 
and has had enough time to complete its access. Hence with bus enable 
and status signals, the flexible bus interfaces of the 'C40 allow high-speed 
shared bus configurations to be implemented. 
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Figure 13-14. TMS320C40 Parallel DSP System Architectures 
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Figure 13-14. TMS320C40 Parallel DSP System Architectures (Continued) 
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Figure 13-14. TMS320C40 Parallel DSP System Architectures (Concluded) 
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In this section, a 'C40 shared memory example is shown. Four 'C40s share 
SRAM with their global buses tied together. A bus arbitrator implemented 
as a programmable logic device provides a fair scheme for processor ac- 
cess to the shared bus. The design shown here uses high speed parts but 
employs a fully asynchronous handshake protocol, which is still general, al- 
lowing varying speed 'C40s and processors other than 'C40s to be added 
to this bus configuration. 

13.5.3 Shared Bus Interface Overview 

Figure 1 3-1 5 and Figure 1 3-1 6 are examplesoi shared memory configura- 
tions. In these figures: 

□ Four 'C40s (each as shown in Figure 13-15) have their global buses 
tied together, 

□ Each shares 1 28K x 32 of one-wait-stat e SRAM , 

□ 64K of the memory is co ntrolled by R/WO; STRBO and the other 64K are 
controlled by R/W1 and STRB1. 

The memory devices are organized as 64K x 4, 35-ns SRAMs. Due to the 
'C40's bus enable signals— AE, DEandCE— all four 'C40s' data, address, 
and control lines can be tied together for a shared memory configuration. 
However, since 128K words of shared memory are being implemented on 
the global bus (shown in Figure 1 3-1 5 and Figure 13-1 6), the common ad- 
dress lines are buffered to provide adequate drive to the 16 required 
memory devices. Also, the memories' chip-enable lines are pulled up to 5 
volts through 22-kQ resistors to ensure that the memory devices are dis- 
abled when no 'C40 is accessing them. 

The required shared global bus interface logic consists of two levels of bus 
arbitration logic implemented as programmable logic devices (PLD). Each 
of the 'C40s has an identical first level of logic that interfaces to the shared 
second level arbiter. The first level of logic for each of the four 'C40s consists 
of one 7-ns 16R6 PLD and one 7-ns 16R4 PLD (center of Figure 13-15). 
Each first level 1 6R6 PLD receives status and control signals from the corre- 
sponding 'C40, determines what kind of global bus transfer the associated 
'C40 requires, and issues a global bus request signal to the global bus con- 
troller (GBC, bottom of Figure 13-16), which, with the bus-grant time-out 
counter, implements the second level of arbitration logic. The GBC is im- 
plemented with a 7-ns 1 6R8 PLD, and the timeout counter is implemented 
with a 7-ns 1 6R4 PLD. In addition to the two GBC PLDs, a 1 6L8 PLD is used 
to issue write enable signals to the shared memory. 

Since typical high-speed PLDs do not have many registered I/O pins or mul- 
tiple clock sources, each first-level 1 6R6 PLD uses a 1 6R4 PLD to synchro- 
nize some of the input and output signals, and the 1 6R8 GBC PLD uses ex- 
ternal flip-flops to synchronize input signals. 
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If a 'C40 requires uninterrupted, multicycle global bus transfers, the first-le- 
vel PLD keeps its bus-request signal active until the uninterruptable cycles 
are complete. The bus controller performs arbitration between the 'C40s re- 
questing the shared global bus. If a 'C40 is given access to the bus, the bus 
controller sends its first-level PLDs a bus grant signal. The first-level PLD 
then sends a bus enable signal to the 'C40, which brings its bus control, ad- 
dress, a nd data sig nals out of high impedance. The first-level PLD also 
sends a BUSRDYQ signal to the 'C40 to end each read or write cycle. 

Figure 13-15. TMS320C40 Shared Memory Interface 




BUSREQn (to GBC) 

Notes : 1 ) This figure represents one of four 'C40s and its interface (the 3 other 'C40s in the system have the 
same configuration). 

2) The shared memory (shared by the four 'C40s) and global bus controller are shown 
in Figure 1 3-1 6 on the next page. 

3) The fixed/rotating priority is a programmable option at the global bus controller (GBC). 
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Figure 13-16. TMS320C40 Shared Memory and Bus Controller Interface 
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For full-speed operation, the 'C40s run from separate, 50-MHz crystal-oscil- 
lator clock sources. For synchronization of shared bus control signals, the 
H3 output clock of each 'C40 serves as the 1 6R4 PLD synchronizer clock 
for first-level input and output signals. Also, the H1 output clock of each 'C40 
serves as the state machine clock for each of the first-level, 1 6R6 PLDs. In 
addition, for high-speed bus controller synchronization, a 50-MHz crystal 
oscillator is used as the input clock for the 1 6R8 GBC PLD, the 1 6R4 time- 
out generator PLD, and the GBC input signal synchronizers. (Note: for fast- 
est bus arbitration, the 'C40s sharing the bus can be synchronized by having 
common RESET and CLKIN inputs. If the 'C40s are synchronized in this 
way, the 50-MHz input to the second-level, global control PLDs can be the 
common CLKIN.) The AS1 74 D flip-flops are used as GBC input signal syn- 
chronizers. 

Due to these arbitration synchronizer delays and the 35-ns SRAMs, access 
to the shared memory requires wait states. After an arbitration win, the first 
shared memory access requires three H1 cycles, and arbitration requires 
at least two H1 cycles from BUS REQUEST active to BUS ENABLE active. 
Figure 13-17 is a timing diagram of the arbitration contest. A bus master's 
first access after an arbitration win takes at least three H1 cycles; however, 
subsequent read or write accesses require only two H1 cycles. The three- 
cycles required for the first access provide enough time for the old bus mas- 
ter to stop driving the bus after an arbitration loss and enough time for new 
bus master control signals to go active and inactive to complete 35-ns 
memory accesses. Also, three-cycle memory accesses allow enough time 
for signal buffering (buffer delays are less than 15 ns with commercially 
available parts) between the processor bus and memory. 

The subsection that follows covers the global bus configuration for use with 
this shared memory configuration. 

13.5.3. 1 Global Memory Interface Control Register (GMICR) Configuration 

For use in this shared memory configuration, the global bus should be confi- 
gured as such at the GMICR: 

SWW = 00 (RDYint = RDYext) 
STRB ACTIVE = 01111 2 
PAGESIZE =01 111 2 
STRB SWITCH = 0 

In addition, IIOF1 should be configured as a general-purpose output pin. 
IIOF1 high signals that a high-priority DMA request is active. 



13-46 



Hardware Applications 



Parallel Processing Interfaces 



13 



Figure 13-17. Successful TMS320C40 Arbitration and Data Read From Shared Bus Memory Followed 
by an Unsuccessful Arbitration Contest 
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13.6 Bus Arbitration 

13.6.1 Arbitration Implementation 

Arbitration on the bus is implemented with two levels of logic. The first level 
consists of four identically programmed 7-ns, 16R6 PLDs, and four identi- 
cally programmed 16R4 PLDs, with one16R4 and 16R6 associated with 
each 'C40. The si gnals n eeded for arbitration from the 'C40 are STAT(3-0), 
STRBO, STRB1, LOCK, and IIOF1 (PAGE can be used in designs where 
page switching is necessary). IIOF1 should be configured as an output pin 
and should be used to indicate that a high-priority DMA transfer is active. 
Applications software should set IIOF1 low before priority DMA cycles are 
started. Figure 13-18 illustrates graphically the state machine for e ach o f 
the fi rst-level 16R6 PLDs. Each first-level PLD sends an active low BUS- 
REQ signal to the global bus controll er, the secon d level of arbitration logic. 
The global bus controller sends a BUSGRANT signal to the requesting 
, C40 , s first level logic when it has been granted control of the global bus. If 
an interlocked or high -priority DM A bus request has been granted, the first- 
level logic will keep its BUSREQ asserted low as long as in terlocked o r prior- 
ity DMA cycles are required. The bus controller will see BUSREQ remain 
active and will give the current 'C40 bus master access to the bus until the 
interlocked or priority DMA operations are complete. 

After a high-priority DMA bus cycle is complete, the 'C40 applications soft- 
ware should clear (set IIOF1 to logic level 1). Accordingly, interlocked ac- 
cessto memory should always end in aSIGI, STII, or STFI operation to bring 
LOCK inactive. If priority accesses are completed by making IIOF1 or 
L OCK inactiv e, the first-level PLD will always have an opportunity to bring 
its BUSREQ inactive, preventing shared bus deadlock. 

When the 'C40 associated with a first-level PLD is not the global bus master 
(i.e. , cannot acce ss the global bus), the first-level PLD sends a logic level 
one BUSREADY signal to that 'C40, extending any pending bus cycle until 
after the 'C40 becomes bus master and h as completed a n ac cess. Inaddi- 
tion, each first-level of logic sends bo th a BUSENAB LE and CTLENABLE 
signal to the correspon ding C40. The BUSENABLE signaMs connected to 
the DE and AE pins and CTLENABLE is connected to the CE pin of the cor- 
responding 'C40. These two signals cause the following to be in high-impe- 
dance when another 'C40 in the system is accessing the shared bus: bus 
chip enable and the address and data lines. 
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Notes: 1 ) In this state diagram, the output signals are all shown as active high for diagram clarity. 

2) A " ! " in front of a signal indicates that it is not active (deasserted). 

3) & = logical AND of signals; # = logical OR of signals. 
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For proper system reset operation, a RESET signal clears bus reques ts to 
the global bu s controller and sends a logic level one BUSRDY and 
BUSENABLE signal to each 'C40 to extend upcoming bus cycles and three- 
state the bus until that 'C40 has been granted access to the bus. 

Figure 1 3-1 9 shows equations for programming the 1 6R6 PLD used for the 
first-level logic. The PLD language shown in this figure is ABEL. ABELs PLD 
language is used to describe the state machine illustrated in Figure 13-18. 



Note: Active-Low Indicators 

In listings (e.g., Figure 1 3-1 9), an underscore following a signal name (e.g., 
busreqj indicat es the sign al is active low. (in regulartext, such signals are 
overbarred (e.g., BUSREQ). 
i i 

The three PLD outputs — busreq_, busenable_, and busrdy_ — are 
used for three of the output state bits. The park_state and 
start_state bits (used to indicate the park state and start state) are the 
fourth and fifth output state bits. Also included in the ABEL description are 
test vectors for the state machine. 

The PLDs described in Figure 13-19 and Figure 13-20 work together to 
interface to the GBC. Figure 13-20 (page 13-60) shows equations for 
programming the 1 6R4 PLD used for synchornizing the first-level input and 
output signals. 
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Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) 
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0007 
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0008 








0009 


|c40u2 device 


'P16R6' / 


0010 








0011 


I "inputs for global interface logic 


0012 


1 hi 


Pin 1; 


"clock input 


0013 


I priDMA_ 


Pin 2; 


"flag req output used to signal 
priority DMA 


0014 


I stat3 


Pin 3; 


"stat3=0 STRB0 access, stat3«l 
STRB1 access 


aai r 


I stat2 


Pin 4; 




OUlo 


I statl 


Pin 5; 




0017 
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Pin 6; 




0018 


I strb0_ 
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aai q 


I strbl_ 


Pin 8; 


"rdy signal from the external 
expansion connector 


a a o a 
UUzU 


1 bg_ 


Pin 9; 


"busgrant (from bus arbiter) 


0021 


I lock_ 


Pin 12; 


0022 


I reset__ 


Pin 19; 


Ouzo 








0024 


I "outputs for global interface logic 


0025 


I start_jstate 
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; "low if the output state is the 
strt_cycle state 


0026 


I busreq__ 
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0032 








0033 


I "define machine 


state bits 


0034 


1 " [start, park, busreq_, busenable_, busrdy_] ; 


0035 








0036 


1 idle 


A blllll 


; "31 


0037 


1 req__cycle = 


A bll011 


; "21 


0038 


1 strt_cycl = 
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0039 


1 do_cycle = 
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13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) (Continued) 

fin_cycle - A bllOOO; "24 
park = A bl0001; "17 

"convert to positive logic to make the test vectors easier to 
understand 



lock 
bg 

priDMA 



! lock_; 
= ! bg_; 

!priDMA_; 

idle_stat = (stat2 & statl & statO) ; "the bus is idle 
when all are hi 



"out state 
ost 



[start_state, park_state, busreq_, 
busenable__, busrdy_] ; 



c,H,L,X 
@page 

state_diagram ost 



state idle: 

case (!reset_ # idle_ stat) 

( reset_ & ! idle_stat ) 
endcase; 



: idle; 

: req__cycle; 



state req_cycle: 

case (!reset__ # idle__stat) :idle; 

( reset__ & bg_ & I idle_stat ) :req_cycle; 

( reset_ & !bg_ & ! idle_stat ):strt_cycl; 
endcase; 



state strt__cycl: 

case (!reset__) 
( reset_) 
endcase; 



state do__cycle: 

case (!reset__) 
( reset ) 



: idle; 
:do___cycle; 



: idle; 

: f in_cycle; 
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Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) (Continued) 
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0106 


I !gwe := reset 
I req__ & !bg_J # 


& stat2 & !idle_stat & ((!bus 
busenable__) ; 


0107 

U JL \J 1 


1 ©page 




0108 
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U -L U ¥ 


I "Test 1st level global arbitration logic 


0110 
U ± JL U 


1 test_yectors 




0111 


I ( [hi, stat3, stat2, statl, statO, lock__, priDMA, strb0_,bg, strbl_, 
1 reset_] -> [ost , gwe_J ) 


0112 


I [ c, X, H, H, H, 


X, X, X, X, H, L] -> [ idle,H]; 


0113 


I [ C/ X/ H, L, H, 


H, L, X, L, H, H] -> [req_cycle,H] ; 


0114 


1 [ X, H, H, H, 


X, X, X, X, H, L] -> [ idle,H]; 


0115 






0116 


I [ c, X, X, X, L, 


X, X, X, L, H, H] -> [req_cycle, H] ; 


0117 


I [ c, X, H H, X, 


L, X, X, X, H, H] -> [strt_cycl,L] ; 


0118 


I [ c, X, H, H, H, 


X, X, X, X, H, L] -> [ idle,H]; 


0119 


I "vector 7 




0120 


I [ c f X f Xf Xf hf 


X, X, X, L, H, H] -> [req_cycle,H] ; 
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Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) (Continued) 
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x, 


H, 


X, 


L, 


X, 


X, 


X, 


H, 


H, 


H] 


-> 


[do_cycle, L] ; 


0128 


1 [ c, 


x, 


H, 


x, 


L, 


x, 


X ' 


H, 


H, 


H, 


H] 


-> 


[f in_cycle, L] ; 


0129 


1 [ c, 


x, 


H, 


H, 


H, 


x, 


x, 


X, 


X, 


H, 


L] 


-> 


[ idle,H] ; 


0130 


I "vector 16 






















0131 


1 t c, 


x, 


X, 


x, 


L, 


x, 


x, 


x, 


L, 


H, 


H] 


-> 


[req_cycle, H] ; 


0132 


1 [ c, 


X/ 


H, 


x, 


L, 


x, 


x, 


x, 


H, 


H, 


H] 


-> 


[strt_cycl, L] ; 


0133 


1 [ c, 


x, 


H, 


x, 


L, 


x, 


x, 


x, 


H, 


H, 


H] 


-> 


[do_cycle, L] ; 


0134 


1 [ c, 


x, 


H, 


x, 


L, 


x, 


X, 


H, 


H, 


H, 


H] 


-> 


[f in_cycle, L] ; 


0135 


I [ c, 


L, 


H, 


x, 


L, 


x, 


x, 


H, 


H, 


H, 


H] 


-> 


[park, L] ; 


0136 


1 [ c, 


x, 


H, 


H, 


H, 


x, 


x, 


X, 


X, 


H, 


L] 


-> 


[ idle,H]; 


0137 




























0138 




























0139 




























0140 


1 [ c, 


x/ 


X, 


x, 


L, 


x, 


x, 


X, 


L, 


H, 


H] 


-> 


[req_cycle, H] ; 


0141 


1 [ c, 


x, 


H, 


x, 


L, 


x, 


x, 


X, 


H, 


H, 


H] 


-> 


[strt_cycl, L] ; 


0142 


1 [ c, 


x, 


H, 


x, 


L, 


x, 


x, 


X, 


H, 


H, 


H] 


-> 


[do_cycle, L] ; 


0143 


1 [ c, 


x, 


H, 


x, 


L, 


X, 


x, 


H, 


H, 


H, 


H] 


-> 


[f in_cycle, L] ; 


0144 


1 [ c, 


L, 


H, 


x, 


L, 


x, 


x, 


H, 


H, 


H, 


H] 


-> 


[park,L] ; 


0145 




























0146 


I "vector 09 






















0147 


1 I c, 


x, 


H, 


x, 


X, 


H, 


L, 


X, 


L, 


H, 


H] 


-> 


[ idle,L]; 


0148 


1 [ c, 


x, 


H, 


H, 


H, 


x, 


x, 


X, 


X, 


H, 


H] 


-> 


[ idle,H]; 


0149 


1 [ c, 


x, 


L, 


H, 


L, 


H, 


L, 


X, 


L, 


H, 


H] 


-> 


[req_cycle, H] ; 


0150 


1 [ c, 




X, 


L, 


H, 


L, 


x, 


X, 


H, 


L, 


H] 


-> 


[req_cycle, H] ; 


0151 


1 [ c, 


x, 


L, 


H, 


L, 


X, 


x, 


H, 


H, 


H, 


H] 


-> 


[strt_cycl, H] ; 


0152 


1 [ c, 


x, 


L, 


H, 


L, 


X, 


X, 


H, 


L, 


H, 


H] 


-> 


[do_cycle, H] ; 


0153 


1 [ c, 


x, 


L, 


H, 


L, 


H, 


L, 


H, 


L, 


H, 


H] 


-> 


[f in_cycle, H] ; 


0154 


1 [ c, 


L, 


L, 


H, 


L, 


H, 


L, 


H, 


L, 


H, 


H] 


-> 


[park,H] ; 


0155 


1 [ c, 


x, 


X, 


X, 


X, 


H, 


L, 


H, 


L, 


H, 


H] 


-> 


[idle,H] ; 


0156 




























0157 


I "vector 18 






















0158 


1 [ c, 


x, 


H, 


H, 


H, 


L, 


X, 


H, 


L, 


H, 


H] 


-> 


[idle,H]; 


0159 


1 [ c, 


H, 


L, 


X, 


H, 


L, 


x, 


H, 


L, 


H, 


H] 


-> 


[req_cycle] ; 


0160 


1 [ c, 


H, 


L, 


X, 


H, 


L, 


x, 


H, 


L, 


H, 


H] 


-> 


[req_cycle, H] ; 


0161 


1 [ c, 


H, 


L, 


x, 


H, 


X, 


x, 


H, 


H, 


H, 


H] 


-> 


[strt_cycl, H] ; 


0162 


1 [ c, 


H, 


L, 


x, 


H, 


L, 


x, 


H, 


H, 


L, 


H] 


-> 


[do_cycle, H] ; 


0163 


1 [ c, 


H, 


L, 


x, 


H, 


H, 


L, 


H, 


L> 


L, 


H] 


-> 


[f in_cycle, H] ; 
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Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) (Continued) 


0164 


1 [ c, 


H, 


L, 


X, 


H, 


H, 




H, 


X, 


L, 


H] 


-> 


[park, H] ; 


0165 


1 [ L, 


H, 


L, 


X, 


H, 


H, 


L, 


H, 


X, 


H, 


H] 


-> 


[park, H] ; 


0166 


1 [ c, 


H, 


L, 


x, 


H, 


H, 


H, 


H, 


L, 


L, 


H] 


-> 


[f in_cycle, H] ; 


0167 


1 [ c, 


H, 


L, 


x, 


H, 


H, 


H, 


H, 


X, 


L, 


H] 


-> 


[park, H] ; 


0168 


1 [ L, 


X, 


L, 


x, 


H, 


H, 


H, 


H, 


X, 


H, 


H] 


-> 


[park,H] ; 


0169 


1 [ c, 


L, 


L, 


L, 


H, 


L, 


x, 


H, 


H, 


H, 


H] 


-> 


[do__cycle, H] ; 


0170 


1 [ c, 


L, 


L, 


L, 


H, 


H, 


L, 


H, 


L, 


H, 


H] 


-> 


[f in__cycle, H] > 


0171 


1 [ c, 


L, 


L, 


L, 


H, 


H, 


L, 


H, 


X, 


L, 


H] 


-> 


[park, H] ; 


0172 


1 [ c, 


H, 


H, 


H, 


H, 


H, 


H, 


H, 


X, 


H, 


H] 


-> 


[park, H] ; 


0173 


1 [ c, 


H, 


H, 


H, 


H, 


H, 


H, 


H, 


X, 


H, 


H] 


-> 


[park, H] ; 


0174 


1 [ c, 


X, 


X, 


x, 


X, 


H, 


L, 


H, 


L, 


H, 


H] 


-> 


[idle,H] ; 


0175 


1 "vector 37 






















0176 


1 [ c, 


X, 


H, 


H, 


H, 


L, 


X, 


H, 


L, 


H, 


H] 


-> 


[idle,H] ; 


0177 


1 [ c, 


x, 


L, 


L, 


L, 


L, 


X, 


H, 


L, 


H, 


H] 


-> 


[req__cycle, H] ; 


0178 


1 [ c, 


x, 


L, 


L, 


L, 


L, 


X, 


H, 


L, 


H, 


H] 


-> 


[req__cycle, H] ; 


0179 


1 [ c, 


x, 


L, 


L, 


L, 


X, 


X, 


H, 


H, 


H, 


H] 


-> 


[strt__cycl, H] ; 


0180 


\ I c, 


x, 


L, 


L, 


L, 


L, 


x, 


H, 


H, 


H, 


H] 


-> 


[do_cycle, H] ; 


0181 


1 [ c, 


x, 


L, 


L, 


L, 


H, 


L, 


H, 


L, 


H, 


H] 


-> 


[f in__cycle, H] ; 


0182 


1 [ c, 


H, 


L, 


L, 


L, 


H, 


L, 


H, 


X, 


H, 


H] 


-> 


[park,H] ; 


0183 


1 [ L, 


H, 


L, 


L, 


L, 


H, 


L, 


H, 


X, 


H, 


H] 


-> 


[park, H] ; 


0184 


1 [ c, 


H, 


L, 


L, 


L, 


L, 


H, 


H, 


L, 


L, 


H] 


-> 


[ f in_cycle, H] ; 


0185 


1 [ c, 


H, 


L, 


L, 


L, 


L, 


H, 


H, 


X, 


L, 


H] 


-> 


[park,H] ; 


0186 


1 [ L, 


X, 


L, 


L, 


L, 


L, 


H, 


H, 


X, 


H, 


H] 


-> 


[park,H] ; 


0187 


1 [ c, 


H, 


L, 


L, 


L, 


L, 


H, 


H, 


H, 


H, 


H] 


-> 


[do__cycle, H] ; 


0188 


1 [ c, 


H, 


L, 


L, 


L, 


L, 


H, 


H, 


L, 


L, 


H] 


-> 


[f in__cycle, H] ; 


0189 


1 [ c, 


H, 


L, 


L, 


L, 


H, 


L, 


H, 


X, 


L, 


H] 


-> 


[park,H] ; 


0190 


1 [ L, 


H, 


L, 


L, 


L, 


H, 


L, 


H, 


X, 


H, 


H] 


-> 


[par,H].; 


0191 


1 [ c, 


H, 


H, 


H, 


H, 


L, 


H, 


H, 


x, 


H, 


H] 


-> 


[park,H] ; 


0192 


1 [ c, 


H, 


H, 


H, 


H, 


L, 


H, 


H, 


x, 


H, 


H] 


-> 


[park,H] ; 


0193 


1 t c, 


L, 


L, 


x, 


X, 


H, 


L, 


L, 


H, 


H, 


H] 


-> 


[f in_cycle, H] ; 


0194 


1 [ c, 


L, 


L, 


X, 


X, 


H, 


L, 


H, 


X, 


H, 


H] 


-> 


[park,H] ; 


0195 


1 [ c, 


X, 


x, 


x, 


x, 


H, 


L, 


H, 


L, 


H, 


H] 


-> 


[idle,H] ; 


0196 


I "vector 62 






















0197 


[ [ c, 


X, 


H, 


H, 


H, 


L, 


X, 


H, 


L, 


H, 


H] 


-> 


[idle,H] ; 


0198 


1 [ c, 


x, 


L, 


x, 


H, 


L, 


X, 


H, 


L, 


H, 


H] 


-> 


[req___cycle, H] ; 


0199 


1 [ c, 


x, 


L, 


x, 


H, 


x, 


x, 


H, 


H, 


H, 


H] 


-> 


[strt_cycl, H] ; 


0200 


1 [ c, 


x, 


L, 


x, 


H, 


L, 


X, 


H, 


H, 


L, 


H] 


-> 


[do__cycle, H] ; 


0201 


1 [ c, 


H, 


L, 


x, 


H, 


H, 


L, 


H, 


L, 


L, 


H] 


-> 


[f in_cycle, H] ; 


0202 


1 [ c, 


H, 


L, 


x, 


H, 


H, 


L, 


H, 


X, 


L, 


H] 


-> 


[park,H] ; 


0203 


1 [ L, 


H, 


L, 


x, 


H, 


H, 


L, 


H, 


X, 


H, 


H] 


-> 


[park,H] ; 


0204 


1 [ c, 


H, 


L, 


x, 


H, 


H, 


H, 


H, 


L, 


L, 


H] 


-> 


[ f in___cycle, H] ; 


0205 


1 [ c, 


H, 


L, 


x, 


H, 


H, 


H, 


H, 


X, 


L, 


H] 


-> 


[park, H] ; 


0206 


1 [ L, 


H, 


L, 


x, 


H, 


H, 


H, 


H, 


X, 


H, 


H] 


-> 


[park,H] ; 
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Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) (Concluded) 



0207 


1 [ c, 


H, 


H, 


L, 


H, 


L, 


X, 


H, 


H, 


H, 


H] 


-> 


[do_cycle, L] ; 


0208 


1 [ c, 


H, 


H, 


L, 


H, 


H, 


L, 


H, 


L, 


H, 


H] 


-> 


[f in_cycle, L] ; 


0209 


1 [ c, 


H, 


H, 


L, 


H, 


H, 


L, 


H, 


X, 


H, 


H] 


-> 


[park, L] ; 


0210 


1 [ c, 


X, 


L, 


L, 


L, 


L, 


H, 


H, 


H, 


H, 


H] 


-> 


[do_cycle, H] ; 


0211 


1 [ c, 


H, 


L, 


L, 


L, 


L, 


H, 


H, 


L, 


L, 


H] 


-> 


[fin_cycle, H] ; 


0212 


1 [ c, 


H, 


L, 


L, 


L, 


H, 


L/ 


H, 


X, 


L, 


H) 


-> 


[park, H] ; 


0213 


1 [ L, 


H, 


L, 


L, 


L, 


H, 


L, 


H, 


X, 


H, 


H] 


-> 


[park, H] ; 


0214 


1 t c, 


X, 


L, 


X, 


X, 


H, 


L, 


L/ 


H, 


H, 


H] 


-> 


[f in_cycle, H] / 


0215 


1 [ c, 


L, 


L, 


X, 


X, 


H, 


L, 


H, 


X, 


H, 


H] 


-> 


[park, H] ; 


0216 


1 [ c, 


H, 


H, 


H, 


H, 


X, 


X, 


H, 


H, 


H, 


H] 


-> 


[park, H] ; 


0217 


1 [ c, 


H, 


H, 


H, 


H, 


H, 


H, 


H, 


H, 


H, 


H] 


-> 


[park, H] ; 


0218 


1 [ c, 


X, 


X, 


X, 


X ' 


H, 


L, 


H, 


L, 


H, 


H] 


-> 


[idle,H] ; 


0219 


I @page 


























0220 


1 [ c, 


X, 


H, 


X, 


L, 


X, 


X, 


X, 


L, 


H, 


H] 


-> 


[req_cycle, H] ; 


0221 


1 [ c, 


x, 


H, 


X, 


L, 


X, 


X, 


X, 


H, 


H, 


H] 


-> 


[strt_cycl, L] ; 


0222 


1 [ c, 


x, 


H, 


x, 


L, 


X, 


X, 


X, 


H, 


H, 


H] 


-> 


[do_cycle, L] ; 


0223 


1 t c, 


x, 


H, 


x, 


L, 


X, 


x, 


H, 


H, 


H, 


H] 


-> 


[f in_cycle, L] ; 


0224 


1 [ c, 


L, 


H ' 


x, 


L, 


X, 


x, 


H, 


H, 


H, 


H] 


-> 


[park, L] ; 


0225 


1 [ c, 


X, 


H, 


x, 


X, 


H, 


L, 


X, 


L, 


H, 


H] 


-> 


[ idle, L] ; 


0226 


1 [ c, 


X, 


H, 


L, 


L, 


X, 


x, 


x, 


L, 


H, 


H] 


-> 


[req_cycle, H] ; 



0227 | " ( [hi, stat3, stat2 f statl, statO, lock_,priDMA, strbO_,bg, strbl, 
I reset ] -> [outst, gwe_J ) 

0228 | 

0229 | 

0230 lend c40_local_glob_bus_interf 

0231 I 

0232 | 

0233 | 

0234 | 

0235 | 
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The Six PLD States are idle, request_cycle, start_cycle, 
do__cycle, f inish__cycle, and park. 

1 ) After reset, the first-level PLD's state machine starts in the idle state 
and transcends to the request_cycie state when a global bus 
transfer is required. 

2) The transition to request_cycle occurs when any of the 'C40 status 
lines (STAT2-0) are low (when the status lines are all high, the bus is 
idle). In this state, the BUSREQ signal becomes active and is sent to the 
GBC PLD. 

3) When the PLD receives a BUSGRANT signal, the state machine transi- 
tions to the start cycle State. For the start_c ycl e, do_cycle, 
f inish_cycle and park states, BUSREQUEST and BUSENABLE 
are active. 

4) From the start_cycle state, the state machine transitions to the 
do_cycle state during the next H1 cycle. 

5) From the do_cycie state, the state machine transiti ons to the f in- 
ish cyc le state in the nextHI cycle. In this state, the BUSRDY signal 
is active. BUSRDY indicates to the 'C40 that the memory access has 
been completed and that another access can be started. 

6) From the f inish_cycie state, th e state m achine transitions to the 
park state during the next H1 cycle. BUSRDY goes inactive in anticipa- 
tion of another bus cycle^tarting. 

Bus parking is implemented for this bus arbitration protocol to allow the cur- 
rent bus master to retain control of the bus and continue making accesses 
to global memory as long as consecutive interlocked or priority DMA cycles 
are required or if no other processor is requesting use of the bus. Bus park- 
ing reduces memory access latency when only one 'C40 desires access to 
the global bus during any duration. 

Notice that when the state machine leaves the park state, allowing the cur- 
rent bus master to perform another shared memory access, the state ma- 
chine can transition to either t he f inish _cyc ie or d o_cycie states, de- 
pending on the level of STRBO or STRB1 . The STRB signal remaining low 
between accesses indicates back-to-back read cycles, which require only 
two H1 cycles to complete for 35-ns memories. Hence, the stat e mach ine 
transitions from the park state directly to f inish_cycie. If the STRB sig- 
nal goes high one H1 cycle and then back low between accesses, the state 
machi ne trans itions from the park state to do_cycie, allowing the one cycle 
for the STRB high and two for the subsequent access. 

The global bus controller (second-level logic) is implemented as a 16R8 
PLD. This PLD takes as inputs the outputs of ea ch of the four first-level 
PLDs. Hence, the GBC has four BUSREQUEST signals as inputs — one 
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from each of the four, first-level logic PL Ds associate d with each of the 
'C40s. The GBC asserts four outputs, the BUSGRANT signals associated 
with each 'C40's first-level arbitration logic. 

Figure 13-21 illustrates graphicall y the state m achine for the global bus 
controller. The GBC asserts low the BUSG RANT signal a ssociated with the 
'C40 that wins an arbitration contest. This BUSGRANT signal remains ac- 
tive until another processor desires access to the shared bus and a time-out 
signal has been received. The new contestant is not granted access to the 
shared bu s until after the current bus master deasserts (brings high) its 
BUSREQ signal, indicating it has finish ed its pr iority accesses or single 
nonpriority memory access. The sys tem RES ET signal should also be an 
input to the GBC PLD . The system RESETsignal should clear (deassert 
high) all BUSGRANT signals to any of the 'C40s and return the GBC state 
machine to an idle state. 

The time-out signal is also a necessary inpu t for the GBC because of the 
high speed of bus arbitration. Before taking a BUSGRANT signal away, the 
GB C must guara ntee that a bus arbitration winner has had a chance to see 
the BUSGRANT and start using the bus. The timeout signal is generated 
by a counter implemented with a 16R4 PLD. T he counter starts counting 
when a processor first receives a BUSGRANT signal. It counts four cycles 
and then issues a time-out sig nal to the GBC indicating that the GBC can 
take away the current master's BUSGRANT if necessary. Hence, the time- 
o ut counter prov ides at least four cycles for a , C40 , s first level of logic to see 
a BUSGRAN T and start using the bus before the GBC can take the 
BUSGRANT away. Figure 1 3-23 contains the ABEL PLD equations for the 
time-out counter. 

The type of arbitration implemented in this GBC example is a rotating prior- 
ity scheme. This rotating priority scheme provides fair arbitration among the 
four 'C40s sharing the global bus. In a rotating priority scheme, the last bus 
master becomes the lowest (last serviced) priority processor. The proces- 
sors sequentially rotate throughout the priority list with the least recently 
serviced processor having the highest priority in subsequent arbitration con- 
tests. The priority rotates every time the bus request of the current bus mas- 
ter goes inactive and another processor desires access to shared memory. 
At system reset, the priorities are 1 , 2, 3, or 4, with 1 being the highest or 
first serviced priority. 

Figure 13-22 shows PLD equations for programming the 16R8 PLD used 
to implement the rotating priority global bus controller. ABEL's PLD lan- 
guage is used to describe the state machine shown in Figure 13-21. 
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Note: Active-Low Indicators 

In listings (e.g., Figure 1 3-22), an underscore following a signal name (e.g., 
busreqj indicates that t he signal is active low. (in regular text, such sig- 
nals are overbarred (e.g., BUSREQ). 
i i 

The PLD's four outputs are the four busgrant__ lines, with each line giving 
a different 'C40 access to the shared bus. These four bits are also used as 
half of the output state bits. The other four state bits are used to indicate the 
ready state corresponding to each busgrant state. At reset, the GBC state 
machine goes to the idle state. All busgrant_ signals are inactive. In the 
idle state, br_ signals can be received from any of the four first-level PLDs. 
After arbitration, the state machine makes a transition to one of the grx 
states (where x = 1, 2, 3, or 4). The corresponding busgrantx_ output 
signal goes active. The GBC stays in that state until a busrequest_ and 
a time-out signal is received from another processors' first-level PLD. Once 
another busrequest_ is received, the state machine transcends to the 
corresponding bryx state. In this state, the busgrantx_ signal goes 
inactive. However, the GBC state machine stays in this state until the 
corresponding bus request (brx_j input goes inactive high, indicating that 
the current bus master has relinquished control of the shared bus. When 
brx_ goes inactive, the state machine changes to the highest priority 
processor's gry state (where y = 1 , 2, 3, or 4) that had its br_ signal active. 
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13-20. PLD Equations for Programming the 16R4PLD 

I module c40__global_bus__interf ace 
title' 

DWG 
DWG # 

COMPANY TEXAS INSTRUMENTS INCROPORATED' 



0009 


I c40ul device 'P16R4 


/ . 






0010 


| 








0011 


1 "inputs 








0012 


I h3 


Pin 


1; 




0013 


1 bg_ 


Pin 


7; 




0014 


1 busrdy_ 


Pin 


8; 


"busrdy from global int 


0015 


I busenable__ 


Pin 


9; 


"busenable from global 
PAL 


0016 










0017 


I "outputs 








0018 


I ctrl__enable_ 


Pin 


18; 


"enable signal for cont 


0019 


1 rdy_ 


Pin 


17; 


"rdy signal for shared 


0020 


I sync_ae_ 


Pin 


15; 


"synchronized busenable 


0021 


1 kg__sync__ 


Pin 


14; 




A A O 

0022 










0023 










0024 


I "name substitutions 








0025 


I CE__ 




ctrl enable ; 


0026 


1 bry_ 




rdy__; 




0027 










0028 


I "substitutions for test 


vectors 


0029 


1 c,H,L,X, 




•C.1,0 


, . X • ; 


0030 










0031 


I equations 








0032 


I sync__ae_: 




busenable__; 


0033 


1 ! ctrl_enable_ 




! sync__ 


ae_ & !busenable__; 


0034 


1 rdy_ : 




busrdy 


; 


0035 


I bg__sync_: 




bg_; 





@page 



"Test 1st level global arbitration logic 
test vectors 
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Figure 13-20. PLD Equations for Programming the 16R4 PLD (Concluded) 



0044 


I ( [h3, busenable__ 


r busrdy_ 


_/ b 9. 


_] 


->[c 


0045 


1 [ c, 


H, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0046 


1 [ c, 




H, 


H] 


-> [ 




H, 


H] 


0047 


1 [ c, 


L, 


H, 


L] 


-> [ 


L, 


H, 


L] 


0048 


1 [ L, 


L, 


L, 


H] 


-> [ 


L, 


H, 


L] 


0049 


1 [ c, 


L, 


L, 


H] 


-> [ 


L, 


L, 


H] 


0050 


1 [ C/ 


H, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0051 


1 [ c, 


H, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0052 


1 [ c, 


H, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0053 


1 [ c, 


H, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0054 


1 [ L, 


L, 


H, 


L] 


-> [ 


H, 


H, 


H] 


0055 


1 [ c, 


L, 


H, 


H] 


-> [ 


L, 


H, 


H] 


0056 


1 [ c, 


L, 


H, 


H] 


-> [ 


L, 


H, 


H] 


0057 


1 [ c, 


L, 


L, 


H] 


-> [ 


L, 


L, 


H] 


0058 


1 [ c, 


L, 


H, 


L] 


-> [ 


L, 


H, 


L] 


0059 


1 [ c, 


L, 


L, 


L] 


-> [ 


L, 


lit 


L] 


0060 


1 [ L, 


H, 


H, 


H] 


-> [ 


H, 


L, 


L] 


0061 


1 [ c, 


H, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0062 


1 [ c, 


H, 


H, 


H] 


-> [ 


H, 


H i 


H] 


0063 


1 [ c, 


H, 


H, 


L] 


-> [ 


H, 


H, 


L] 


0064 


I @page 
















0065 


1 [ c, 


H, 


H, 


L] 


-> [ 


H, 


H, 


L] 


0066 


1 [ c, 


H, 


H, 


L] 


-> [ 


H, 


H, 


L] 


0067 


1 [ c, 


H, 


H, 


L] 


-> [ 


H, 


H, 


L] 


0068 


1 [ c, 


H, 


H, 


L] 


-> [ 


H, 


H, 


L] 


0069 


1 [ c, 


L, 


H, 


H] 


-> [ 


L, 


H, 


H] 


0070 


1 [ c, 


L, 


H, 


H] 


-> [ 


L, 


H, 


H] 


0071 


1 [ L, 


L, 


L, 


H] 


-> [ 


L, 


fl. 


H] 


0072 


1 [ c, 


L, 


L, 


H] 


-> [ 


L, 


L, 


H] 


0073 


1 [ C/ 


H, 


H, 


H] 


-> [ 


H, 


H, 


r H] 


0074 


1 [ c, 


H, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0075 


1 [ c, 


H, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0076 


1 [ c, 


H, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0077 


1 [ L, 


L, 


H, 


H] 


-> [ 


H, 


H, 


H] 


0078 


1 [ c, 


L, 


H, 


H] 


-> [ 


L, 


H, 


H] 


0079 


1 [ c, 


L, 


H, 


H] 


-> [ 


L, 


H, 


H] 


0080 


1 [ c, 


L, 


L, 


H] 


-> [ 


L, 


L, 


H] 


0081 


1 [ c, 


L, 


H, 


H] 


-> [ 


L, 


H, 


H] 


0082 


1 [ c, 


L, 


L, 


H] 


-> [ 


L/ 




H] 


0083 


















0084 


I end 


c40 


_global__bus_interface 
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Figure 13-21. Global Bus Controllor PLD (Rotating Priority Mode Only) 
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Figure 13-22. PLD Equations for Programming the 16R8 PLD 



0001 | module global bus cntrl 






0002 | title' 










0003 | 












0004 |DWG NAME Shared bus 


interface 




0005 |DWG # 










0006 | 












0007 | COMPANY TEXAS 


INSTRUMENTS INCORPORATED' 


0008 | 












0009 I 


xub5 


device 


'P16R8' ; 


0010 | 












0011 | 


h50 


Pin 


1; 


"50 


MHz clock 


0012 | 


brl 


Pin 


2; 


"bus request 1 


0013 | 


br2 


Pin 


3; 


"bus request 2 


0014 | 


br3 


Pin 


4; 


"bus request 3 


0015 | 


br4_ 


Pin 


5; 


"bus request 4 


0016 | 


reset 


Pin 


6; 


"reset 


0017 | 


fix rot 


Pin 


7; 


"fix 


/rot_ not used h 


0018 | 


oe 


Pin 


11; 






0019 | 


timeout__ 


Pin 


8 






0020 | 


vss 


Pin 


10/ 






0021 | 












0022 | 


bgl 


Pin 


19; 


"grant 1 


0023 | 


bg2 


Pin 


18; 


"grant 2 


0024 | 


bg3 


Pin 


17; 


"grant 3 


0025 | 


bg4 


Pin 


16; 


"grant 4 


0026 | 


s3 


Pin 


15; 


"state 3 


0027 | 


s2 


Pin 


14; 


"state 2 


0028 | 


si 


Pin 


13; 


"state 1 


0029 | 


sO 


Pin 


12; 


"state 0 


0030 | 


vcc 


Pin 


20; 






0031 | 












0032 | 


c / H / L f X 




.C.,1 


,0, .X 


• ; 


0033 | 












0034 | 


"define state 


machine bits 


0035 | 


bus_state 




[s3, s2, si, sO, bg4 ,bg3 ,b< 


0036 | 












0037 | 


"states 










0038 | 


bryl 




A b01111111 


"ready 1 


0039 | 


bry2 




A bl0111111, 


"ready 2 


0040 | 


bry3 




A bll011111, 


"ready 3 


0041 | 


bry4 




"blllOllll, 


"ready 4 


0042 | 


idle 




A bllllllll i 


"idle state 


0043 | 












0044 | 


grl 




"blllllllO, 


"grant 1 


0045 | 


gr2 




A bllllll01, 


"grant 2 


0046 | 


gr3 




"blllllOll, 


"grant 3 


0047 | 


gr4 




A bllll0111, 


"grant 4 


0048 | 












0049 | 


"convert 


inputs to 


positive logic 


0050 | 


brl 




!brl 






0051 | 


br2 




!br2 






0052 | 


br3 




!br3 






0053 | 


br4 




!br4 






0054 | 


reset 




! reset ; 




0055 |@page 
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Figure 13-22. PLD Equations for Programming the 16R8 PLD (Continued) 

0056 | 

0057 j state_diagram bus_state 

0058 | state idle: 

0059 I if ( reset ) then idle 

0060 j else if ( ! reset & !brl & !br2 & br3 & br4)then gr4 

0061 | else if ( ! reset & !brl & !br2 & br3 ) then gr3 

0062 | else if ( ! reset & !brl & br2 ) then gr2 

0063 I else if ( ! reset & brl ) then grl 

0064 j else idle; 
0065 

0066 | state bry4: 

0067 | if ( reset ) then idle 

0068 I else if ( ! reset & br4 ) then bry4 

0069 | else if ( Ireset & !brl & !br2 & !br3 & Ibr4)then idle 

0070 I else if ( Ireset & !brl & !br2 & br3 & !br4) then gr3 

0071 | else if ( Ireset & !brl & br2 & !br4) then gr2 

0072 | else if ( Ireset & brl & Ibr4) then grl; 

0073 I 

0074 | state bry3: 

0075 | if ( reset ) then idle 

0076 j else if ( Ireset & br3 ) then bry3 

0077 | else if ( Ireset & Ibr4 & Ibrl & br2 & Ibr3) then idle 

0078 | else if ( Ireset & Ibr4 & Ibrl & br2 & Ibr3) then gr2 

0079 | else if ( Ireset & Ibr4 & brl & Ibr3) then grl 

0080 | else if ( Ireset & br4 & Ibr3) then gr4; 

0081 I 

0082 j state bry2: 

0083 | if ( reset ) then idle 

0084 | else if ( Ireset & br2 ) then bry2 

0085 | else if ( Ireset & Ibr3 & Ibr4 & Ibrl & Ibr2) then idle 

0086 | else if ( Ireset & Ibr3 & Ibr4 & brl & Ibr2) then grl 

0087 | else if ( Ireset & Ibr3 & br4 & Ibr2) then gr4 

0088 | else if ( Ireset & br3 & Ibr2) then gr3; 

0089 | 

0090 | state bryl: 

0091 | if ( reset ) then idle 

0092 | if ( Ireset & brl ) then bryl 

0093 | else if ( Ireset & Ibr2 & Ibr3 & Ibr4 & Ibrl) then idle 

0094 | else if ( Ireset & Ibr2 & Ibr3 & br4 & Ibrl) then gr4 

0095 | else if ( Ireset & Ibr2 & br3 & Ibrl) then gr3 

0096 | else if ( Ireset & br2 & Ibrl) then gr2; 

0097 | 

0098 | state gr4 : 

0099 | if ( Ireset & (timeout # Ibrl & Ibr2 & Ibr3)) then gr4 

0100 | else if ( reset ) then idle 

0101 | 

0102 | 

0103 | state gr3: 

0104 I if ( Ireset & (timeout # Ibr4 & Ibrl & Ibr2)) then gr3 

0105 j else if ( reset ) then idle 

0106 I 

0107 | 

0108 | state gr2: 
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Figure 13-22. PLD Equations for Programming the 16R8 PLD (Continued) 



0109 


1 




if 


( ! reset 


&) (timeout # 


!br3 & ! 


0110 






else if ( 


reset ) 


then 


idle 


U X JL X 




















0112 




















0113 




state 




grl 










0114 






if 


( ! 


reset 


& (timeout 


# !br2 & 


0115 






else if ( 


reset ) 


then 


idle 


m i c 




















0117 


| @page 


















0115 


1 test_vectors 














0116 




















0117 


I" rotating priority vectors 






0118 


1 ( [h50,brl, 


br2 


,br3,br4, 


timeout , reset ] -> 


0119 


1 "oH^olr 


for go 




IDLE 








0120 


1 L C / 


y 


y 

A f 


y 

A f 


y 

A f 


y 

A f 


T 1 
Li 




L luxe j ; 


0121 


i r C 






T 

JU f 


Hf 


Xf 


n 




Lgxfi j f 


0122 


1 [ c # 


y 
a z 


y 
a i 


x# 


y 

A / 


y 

A / 


L 


_> 




0123 


1 I C t 


T 

i_l / 


T 

Li f 


u 

n. / 


Y 
A r 


Y 

Af 


U 1 


— j> 


[gro j ; 


0124 


1 [ c t 


y 

A f 


y 
a f 


v 


y 
A f 


Y 
A f 


T 

Li 


— > 


L luxe j ; 


0125 


I [ C f 


L 


ft 


Y 


y 

A f 


y 

A f 


rl 


— ^ 


[grz j ; 


0126 


1 L Of 


x, 


x f 


y 
A. j 


Xf 


X, 


T 1 

Ju 




I luxe j ; 


0127 


1 [ c 


H 


y 

A f 


y 

A f 


y 

A f 


y 

A f 


n 




Trrrl 1 • 
Lyrx j / 


0128 


it / 


X # 


x, 


y 

A / 


Xf 


Xf 


T 
Li 




L luxe j / 


0129 


1 [ c # 


L 


x, 


L 


Hf 


x 


n 


— > 


Tar41 • 


0130 


1 [ c # 


H, 


L, 


L 


Hf 


Lf 


n 


— > 


r Vv v*\rA 1 • 

iury*t j / 


0131 


1 [ o # 


x, 


X/ 


x f 


x 


x 


L 


— > 


[idle] ; 


0132 


1 [ c 


L, 


L, 


u 

n / 


Xf 


Xf 


n 




lyroj / 


0133 


1 [ C/ 




H/ 


Hf 


Lf 


Lf 


ft 


— > 


Lwiy j j / 


0134 


| [ c. 


x, 


X, 


X/ 


Xf 


Xf 


L. 


— > 


[idle] ; 


0135 


1 [ c, 


L, 


H, 


x. 


Xf 


Xf 


H ] 


-> 


[gr2]; 


0136 


1 [ c, 


L, 


Hf 


Hf 


Hf 


Lf 


H ] 


~> 


[bry2]; 


0137 


1 [ c, 


X, 


x, 


Xf 


Xf 


Xf 


L ] 


«> 


[idle] ; 


0138 


1 [ c, 


H, 


X, 


Xf 


Xf 


X, 


H ] 


-> 


[grl]; 


0139 


1 [ c, 


H, 


H, 


Hf 


Hf 


Lf 


H ] 


-> 


[bryl]; 


0140 


1 [ c, 


X, 


x, 


X, 


Xf 


Xf 


L ] 


~> 


[idle] ; 


0141 




















0142 




















0143 


1 [ c, 


x, 


x, 


Xf 


Lf 


Xf 


H ] 


~> 


[idle] ; 


0144 


1 [ c, 


L, 


L, 


Hf 


Xf 


Xf 


H ] 


-> 


[gr3]; 


0145 


1 [ c, 


x, 


X, 


Lf 


Xf 


Xf 


H ] 


-> 


[idle] ; 


0146 


1 [ c, 


Lf 


H, 


Xf 


Xf 


Xf 


H ] 


-> 


[gr2]; 


0147 


1 [ c, 


X, 


L, 


Xf 


Xf 


Xf 


H ] 


-> 


[idle] ; 


0148 


1 [ c, 


H, 


x, 


Xf 


Xf 


Xf 


H ] 


~> 


[grl]; 


0149 


1 [ c, 


Lf 


X, 


Xf 


Xf 


Xf 


H ] 


-> 


[idle]; 


0150 


1 [ c, 


Lf 


L, 


Lf 


Hf 


Xf 


H ] 


~> 


[gr4] ; 


0151 


1 [ c, 


H, 


Lf 


Lf 


Hf 


L, 


H ] 


~> 


[bry4] ; 
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Figure 13-22. PLD Equations for Programming the 16R8 PLD (Continued) 



0152 


1 [ c, L, 


L, 


L, 


L, 


X, 


H ] 


-> 


[idle] / 


0153 


1 [ c, L, 


L, 


H, 


X, 


X 


H ] 


-> 


[gr3] ; 


0154 


1 [ c, L, 


H, 


H, 


L, 


L, 


H ] 


-> 


[bry3] ; 


0155 


1 [ c, L, 


L, 


L, 


L, 


x, 


H ] 


-> 


[idle] ; 


0156 


1 [ c, L, 


H, 


X, 


X, 


X/ 


H ] 


-> 


[gr2] ; 


0157 


1 [ c, L, 


H, 


H, 


H, 


L, 


H ] 


-> 


[bry2] ; 


0158 


1 [ c, L, 


L/ 


L, 


L, 


X/ 


H ] 


-> 


[idle] ; 


0159 


1 [ c, H, 


X/ 


x, 


X/ 


X, 


H ] 


-> 


[grl] ; 


0160 


1 [ 0, H, 


H, 


H, 


H, 


L, 


H ] 


-> 


[bryl] ; 


0161 


I [c, L, L, 


L, 


L, 


X, 


H, 


H ] 


-> 


[idle] ; 


0162 


1 "vector 7 
















0163 


1 [ c, H, 


x, 


X, 


X, 


X, 


H ] 


-> 


[grl] ; 


0164 


1 [ c, H, 


x, 


x, 


x, 


H, 


H ] 


-> 


[grl] ; 


0165 


1 [ c, H, 


L, 


L, 


L, 


L, 


H ] 


-> 


[grl]; 


0166 


1 [ c, H, 


x, 


X, 


H, 


L, 


H ] 


-> 


[bryl] ; 


0167 


1 [ c, H, 


x, 


x, 


X, 


X, 


H ] 


-> 


[bryl] ; 


0168 


1 [ c, H, 


x, 


x, 


x, 


X, 


H ] 


-> 


[bryl] ; 


0169 


I @page 
















0170 


I "vector 15 
















0171 


1 [ c, L, 


L, 


L, 


H, 


X, 


H ] 


-> 


[gr4 ] ; 


0172 


1 t c, L,. 


L, 


L, 


H, 


X, 


H ] 


-> 


[gr4] ; 


0173 


1 [ c, L, 


L, 


L, 


H, 


X, 


H ] 


-> 


[gr4] ; 


0174 


1 [ c, X, 


x, 


X, 


H, 


H, 


H ] 


-> 


[gr4] ; 


0175 


1 [ c, X, 


x, 


H, 


x, 


L, 


H ] 


-> 


[bry4] ; 


0176 


1 [ c, X, 


x, 


x, 


H, 


x, 


H ] 


-> 


[bry4] ; 


0177 


1 [ c, X, 


x, 


X/ 


H, 


x, 


H ] 


-> 


[bry4] ; 


0178 


I "vector 21 














0179 


1 [ c, L, 


L, 


H, 


L, 


x, 


H ] 


-> 


[gr3] ; 


0180 


1 [ c, L, 


L, 


H, 


L, 


x, 


H ] 


-> 


[gr3] ; 


0181 


1 [ c, L, 


L, 


H, 


L, 


H, 


H ] 


-> 


[gr3] ; 


0182 


1 [ c, X, 


x, 


H, 


X, 


H, 


H ] 


-> 


[gr3] ; 


0183 


1 [ c, X, 


H, 


H, 


X, 


L, 


H ] 


-> 


[bry3] / 


0184 


1 [ c, X, 


X/ 


H, 


x, 


X, 


H ] 


-> 


[bry3] ; 


0185 


1 [ c, X, 


x, 


H, 


x, 


x, 


H ] 


-> 


[bry3] ; 


0186 


I "vector 27 














0187 


It c, L, 


H, 




L, 


x, 


H ] 


-> 


[gr2] ; 


0188 


1 [ c, L, 


H, 


L, 


L, 


x, 


W ] 


-> 


[gr2] ; 


0189 


It c, X, 


H, 


X, 


X, 


H, 


H ] 


-> 


[gr2]; 


0190 


It c, L, 


H, 


L, 


L, 


L, 


H ] 


-> 


[gr2]; 


0191 


It c, H, 


H, 


X, 


X, 


L, 


H ] 


-> 


[bry2] ; 


0192 


It c, X, 


H, 


x, 


x, 


x, 


H ] 


-> 


[bry2] ; 


0193 


It c, X, 


H, 


x, 


x, 


X, 


H ] 


-> 


[bry2] ; 


0194 


I "vector 33 














0195 


It c, H, 


L, 


L, 


L, 


X, 


H ] 


-> 


[grl]; 
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Figure 13-22. PLD Equations for Programming the 16R8 PLD (Concluded) 



0196 


i r c , 


H, 


X. 


H, 




L/ 


H 


-> 


[bryl] ; 


0197 


i r c . 


L. 


L. 


H. 




X/ 


H 


-> 


l y *- «j j / 


0198 


i r c » 


H, 




H, 


X. 


L, 


H 


-> 


rbrv31 ; 


0199 


i r c 


H 


Y 


L 
J- 1 / 


T. 


X/ 


H 


— > 


l y j- j / 


0200 


i r c . 


H. 


H. 


X. 


Y 
*w 


L/ 


H 


— > 


rbrvll • 


\J 4L. VJ X 


1 [ c, 


L, 


H, 


A, 


Y 


A j 


n 




r pry o i . 

Lgrz j , 


0202 


1 [ c, 


x, 


H, 


Y 


u 
n f 


L 


H 


— > 


Tbrv? 1 • 
ivs-yt. j / 


0203 


1 [ c, 


X, 


L, 


L. 


H 


x# 


H 


— > 


L y i. «± j / 


0204 


1 [ c, 


x, 


H, 


Y 


u 


L 


H 


— > 


Thrv4 1 • 
lUJ-Y i j / 


0205 


1 [ c, 


L, 


H, 


Y 




X/ 


H 


— > 


rar21 • 
L y j. j / 


0206 


1 [ c, 


X, 


H, 


H 


Y 
A/ 


L 


H 


— > 


L xjx. y ^ j / 


0207 


I "vector 45 














0208 


1 [ c, 


x, 


L, 


H, 


x, 


x, 


H 


— > 


L y i. j j , 


0209 


1 [ c, 


x, 


X, 


H, 


H, 


L, 


H ] 


-> 


[bry3]; 


0210 


1 [ c, 


x, 


X, 


L, 


H, 


X, 


H ] 


-> 


[gr4] ; 


0211 


1 [ c, 


H, 


X, 


X, 


H, 


L, 


H ] 


-> 


[bry4] ; 


0212 


1 [ c, 


H, 


X, 


X, 


L, 


x, 


H ] 


-> 


[grl]; 


0213 


1 [ c, 


H, 


H, 


X, 


X, 


L, 


H ] 


-> 


[bryl] ; 


0214 


f end 


global 


_bus_cntrl 
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13-23. PLD Equations for Programming the 16R6 PLD 

module c40_global_timeout 
title' 

DWG NAME global arbitration 
DWG # 

COMPANY TEXAS INSTRUMENTS INCROPORATED 



DATE 



c40u4 

" inputs 

h50 

bgl__ 

bg2_ 

bg3__ 

bg4__ 

timeout__ 

si 

sO 



device 



Pin 1; 
Pin 2; 
Pin 3; 
Pin 4; 
Pin 5; 
Pin 13; 
Pin 16; 
Pin 15; 



'P16R6' ; 



"output 



'name substitution to increase readability 

bus_active = (!bgl_ # !bg2_ # !bg3_ # !bg4J ; 

'define machine state bits 
* [timeout_, si, sO] ; 

"states 



idle 
count 1 
count 2 
count 3 
time 

outstate 
c,H,L,X 



A blll 
A bllO 
A bl01 
A bl00 
A b011 



[timeout^, si, sO] ; 
.C.,1,0, .X.; 



state___diagram outstate 
state idle: 

if (Ibus active) then idle 



13-68 



Hardware Applications 



Bus Arbitration 



13 



Figure 13-23. PLD Equations for Programming the 16R6 PLD (Continued) 



0045 


1 else count 1; 










0046 
















0047 


I state 


count 1 












0048 


1 if 


(!bus_ 


active) 


then 


idle 






0049 


I else count2; 










0050 
















0051 


1 state 


count 2 












0052 


1 if 


(!bus_ 


active) 


then 


idle 






0053 


1 else count 3; 










0054 
















0055 


I state 


count 3 












0056 


1 if 


(!bus_ 


active) 


then 


idle 






0057 


I else time 


; 










0058 
















0059 


1 state 


time : 


GOTO idle; 








0060 
















0061 


| ©page 














0062 


1 ''Test 


counter 










0063 


I test_vectors 












0064 


1 ( [h50, 


bgl_, 


bg2_, 


bg3_, 


bg4_J 


-> 


[outstate 


0065 


1 [ c, 


H, 


H, 


H, 


H ] 


-> 


[ idle]; 


0066 


1 [ c, 


L, 


H, 


H, 


H ] 


-> 


[ count 1] ; 


0067 


\ [ c, 


H, 


H, 


H, 


H ] 


-> 


[ idle]; 


0068 


1 [ c, 


L, 


H, 


H, 


H ] 


-> 


[ count 1] ; 


0069 


1 [ c, 


X, 


X, 


X, 


X ] 


-> 


[count 2] ; 


0070 


1 [ c, 


H, 


H, 


H, 


H ] 


-> 


[ idle]; 


0071 


1 [ c, 


L, 


H, 


H, 


H ] 


-> 


[ count 1 ] ; 


0072 


1 [ c, 


X, 


X, 


x, 


X ] 


-> 


[count 2] ; 


0073 


1 [ c, 


X, 


X, 


X, 


X ] 


-> 


[count 3] ; 


0074 


1 [ c, 


H, 


H, 


H, 


H ] 


-> 


[ idle]; 


0075 


1 [ c, 


L, 


H, 


H, 


H ] 


-> 


[ count 1 ] ; 


0076 


1 [ c, 


x, 


x, 


x, 


X ] 


-> 


[count 2] ; 


0077 


1 [ c, 


X, 


X, 


X, 


X ] 


-> 


[count 3] ; 


0078 
















0079 


1 [ c, 


x, 


x, 


x, 


X ] 


-> 


[ idle]; 


0080 


1 [ c, 


H, 


L, 


H, 


H ] 


-> 


[ count 1] ; 


0081 


1 [ c, 


x, 


x, 


X, 


X ] 


-> 


[count 2] ; 


0082 


1 [ c, 


x, 


x, 


x, 


X ] 


-> 


[count 3] ; 


0083 


1 [ c, 


x, 


x, 


x, 


X ] 


-> 


[time] ; 


0084 


1 [ c, 


x, 


x, 


x, 


X ] 


-> 


[ idle]; 


0085 


1 [ c, 


H, 


H, 


L, 


H ] 


-> 


[ count 1] ; 


0086 


1 [ c, 


X, 


x, 


x, 


X ] 


-> 


[count 2] ; 


0087 


1 [ c, 


X, 


x, 


x, 


X ] 


-> 


[count 3] ; 


0088 


1 [ c, 


X, 


x, 


x, 


X ] 


-> 


[time] ; 
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0089 I 


[ c, 


X, 


x, 


x, 


X 


] 


-> 


[ idle]; 


0090 | 


[ c, 


H, 


H, 


H, 


L 


] 


-> 


[ count 1] ; 


0091 | 


[ c, 


x, 


x, 


x, 


X 


] 


-> 


[count 2 ] ; 


0092 | 


[ c, 


x, 


x, 


x, 


X 


] 


-> 


[count 3] ; 


0093 | 


[ c, 


x, 


x f 


x, 


X 


] 


-> 


[time] ; 


0094 | 


[ c, 


x, 


x, 


x, 


X 


] 


-> 


[ idle]; 



end c40__global_timeout 



13.6.2 Arbitration Alternatives 

If more arbitration flexibility is desired, a fixed priority mode can be implem- 
ented in the global bus controller PLD. A fixed scheme can be used in con- 
junction with this rotating priority mode if a fixed/rotating input is added to 
the GBC PLD to allow either of the two arbitration methods. One of the spare 
IIOF pins can be configured as a general-purpose output pin to act as the 
arbitration mode control pin. For example, if FIX/ROT (IIOF2) = 0, the four 
'C40s have rotating priorities; if FIX/ROT = 1 , the four processors have fixed 
priorities. To reduce state machine complexity, the rotating priorities can be 
preset at system reset to the same values as in the fixed arbitration mode, 
with the processors having priorities of 1 , 2, 3, or 4, with 1 being the highest 
(first serviced) priority. 



13.6.3 Global Bus Arbitration and Transfer Timing 

To illustrate the timing involved with global bus arbitration and data trans- 
fers, Figure 1 3-1 7 (page 1 3-47), Figure 1 3-24 (page 1 3-72), Figure 1 3-25 
and Figure 13-26, show shared bus timings using the rotating priority arbi- 
tration configuration. 

These figures represent a 'C40 requesting a shared bus access when it is 
not currently the bus master. Clock H1 is the output clock of the the '040 re- 
questing access to the bus. Both clocks H1 and H3 have a rate of 25 MHz; 
however, the global bus controller (GBC) input clock is aysnchronous with 
respect to H1 and H3 and has a rate of 50 MHz. 
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Due to the arbitration logic synchronizer delays and the 35-ns SRAMs, 
access to the shared memory requires wait states. A new bus master's first 
memory access after an arbitration win takes at least five H1 cycles (the five 
cycles include the time period from status lines active to the end of the read 
or write cycle), but subsequent reads or writes take only two H1 cycles. Two- 
cycle memory accesses allow enough time for control signals to go active 
and inactive to complete read or write cycles for 35-ns memories. They also 
allow processors to stop driving the bus before another processor starts 
driving the bus after a bus arbitration contest. Also, the two-cycle memory 
accesses allow enough time for signal buffering between the processor bus 
and memory (buffer delays are less than 1 5 ns with commercially available 
parts). 

In Figure 1 3-1 7 (page 1 3-47), a 'C40 wins an arbitration contest immediate- 
ly and does one read cycle. However, it loses arbitration for the next transfer 
on the shared bus (busgrant^ goes inactive high) and the first-level PLD 
brings its busrequest_ signal inactive high to signal the GBC that it has 
given up the bus. Th e firs t-level PLD at the same time sends bus disable sig- 
nals (BUSENABLE and CTLENABLE high) to the AE, DE, and CE pins of 
the 'C40 to three-state the bus. The first-level PLD three-states the bus im- 
mediately because the GBC will give another pr ocessor access to the 
shared bus as soon as it sees this BUSREQUEST and a time-out go inac- 
tive. 

Figure 13-24 shows a successful arbitration contest followed by succes- 
sive reads. The 'C40 is allowed to do successive reads on the shared bus 
because no other processor desires access (busgrant stays active). 

Figure 13-25 illustrates an arbitration win followed by a single write. 
Figure 13-26 shows an aribitration win followed by successive writes and 
an arbitration loss. The second write is allowed to occur because the 
busgrant going inactive is missed by the first-level PLDs, which 
synchronizes o nH1 ri sing. The first-level PLD transcends to the do_cycie 
state because STRB is high and the PLD has not seen the busgrant go 
inactive from the synchronizer output. Even though the first-level PLD sees 
that the busgrant__ is taken away during the next H1 /H3 cycle, it does not 
take away its busrequest_ until the end of the second write cycle. Then, 
the busrequest_ is made inactive, and the bus is disabled. 
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Figure 13-24. Successful TMS320C40 Arbitration; Data Read; Data Read 

do 

, idle y-^ req ^ req ^ start y-^ txfr r*\ Mr r~ \parkr— \ txfr ^ park 



STAT(3-0) 



busreq 
Eg" 
be" 
brdy 

A(30-0)- 



High Impedance 



A r 



- ( Valid Address X Valid Address 



D(31-0)- 



Valid Data 



Valid Data 
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Figure 13-25. Successful TMS320C40 Arbitration and Data Write From Shared Bus Memory Followed 
by an Unsuccessful Arbitration Contest 

do _ finish 



STAT(3-0) 



br 



x 



\ 



Valid Memory Access Request 



X 



Pending Memory 
Access 



A f 



be 



\ 



rdy 



A(30-0) 



Hi Impedance 



< 



a r 



Hi Impedance 



Valid Address 




R/W 



STRB 



Hi Impedance 



Hi Impedance 



7 



D(31-0) 



Hi Impedance 



Valid Data 



Hi Impedance 
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Figure 13-26. Successful 'C40 Arbitration; Consecutive Data Writes; Arbitration Win Followed by 
Successive Writes and an Arbitration Loss 



HI 

STAT0-O) ZDC 
busreq 



be 
brdy 
A(3O-0) 
STRB 
D(31-0) 



High Impedance 



High Impeda 



■> r 



Valid Address 



"\ T 



Valid Address 



J L 




■ < Valid Write Data 



Valid Write Data ) Hi 9 h lm P edan c 



13.6.4 Arbitration Protocol Limitations 

This shared bus arbitration protocol uses handshaking between the GBC 
and the processors sharing the global bus to ensure that only one process- 
sor is driving the bus at any given time. Nonetheless, the global bus control- 
ler should not allow another processor to become bus master until the pre- 
vious master is guaranteed to release the bus completely. Since 'C40s have 
a bus disable (AE, DE, or CE) time of less than 1 5 ns, bus turnoff time is not 
critical unless the GBC input clock frequency is greater than 50 MHz. How- 
ever, if processors with slower turnoff times are used in a shared bus config- 
uration with this protocol, the GBC input clock period cannot be less than 
the bus disable time of the slowest processor in the system. If the GBC input 
clock period is less than a processor's disable time, the GBC could give a 
new master ownership of the bus before the previous master is off the bus. 
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13.7 Reset Signal Generation Control Function 



Several aspects of 'C40 system hardware design are critical to overall sys- 
tem operation. One such function is reset signal generation. 

The reset input controls initialization of internal 'C40 logic and also causes 
execution of the system initialization software. For proper system initializa- 
tion, the reset signal must be applied for at least ten H1 cycles, i.e., 400 ns 
for a 'C40 operating at 50.00 MHz. Upon powerup, however, it can take 20 
ms or more before the system oscillator reaches a stable operating state. 
Therefore, the powerup reset circuit should generate a low pulse on the re- 
set line for 1 00 to 200 ms. Once a proper reset pulse has been applied, the 
processor fetches the reset vector from location zero, which contains the 
address of the system initialization routine. Figure 13-27 shows a circuit 
that will generate an appropriate powerup or push button reset signal. 



The voltage on the reset pin (RESET) is controlled by the R1C1 network. 
After a reset, this voltage rises exponentially according to the time constant 
Rl C-j , as show n in Figur e 1 3-28. In Figure 1 3-27, the 74ALS34 is used to 
provide a clean RESET signal to the '040. 



Figure 13-27. Reset Circuit 



TMS320C40 



RESET 




GND 
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Figure 13-28. Voltage on the TMS320C40 RESET Pin 



Voltage 



V = V C c (1-e-t/x) 



vcc 




t 0 -0 ti 



Time 



The duration of the low pulse on the RESET pin is approximately ti , which 
is the time it takes for the capacitor Ci to be charged to 1 .5 V. This is approxi- 
mately the voltage at which the reset input switches from a logic 0 to a logic 
1 . The capacitor voltage is expressed as 



where % = R-| C-| is the reset circuit time constant. Solving (5) for t results in 



Setting the following: 
R 1 =100kQ 
Ci = 4.7 fxF 
V CC = 5V 
V = V 1 = 1.5 V 

results in t = 1 67 ms. Therefore, the reset circuit of Figure 1 3-27 provides 
a low pulse long enough to ensure the stabilization of the system oscillator 
upon powerup. 



V = V, 



cc 





(6) 
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Note that if synchronization of multiple 'C40s is required, all processors 
should be provided with the same input clock and the same reset signal. Af- 
ter powerup, when the clock has stabilized, all processors may then be syn- 
chronized by generatin g a fallin g edge on the common reset sign al. Since 
it is the falling edge of RESET that establishes synchronization, RESET 
must be high for at least ten H1 cycles initially. Following the falling edge, 
RESET should rem ain low fo r at least ten H1 cycles and then be driven high. 
This sequencing of RESET may be accomplished by using additional cir- 
cuitry based on either RC time delays or counters. 
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TMS320C4x Signal Descriptions and 

Electrical Characteristics 



The sections in this chapter cover the following characteristics of 



theTMS320C4x: 

Section Page 

14.1 Pinout and Pin Assignments 14-2 

1 4.2 Signal Descriptions 14-7 

14.3 TMS320C4x Mechanical Data 14-11 

14.4 Electrical Specifications 14-12 

14.5 Signal Transition Levels 14-14 

14.6 Timing 14-15 

i ! : ! ' ' — : i 

Note: Advance Information 



Unless otherwise noted, this chapter contains advance information on new 
products in the sampling or preproduction phases of development. 
Characteristic data and other specifications are subject to change without 
notice. 

i , i 
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14.1 Pinout and Pin Assignments 

The TMS320C40 (TMS320C4x generation) digital signal processor is avail- 
able in a 325-pin grid array (PGA) package. The pinout of this package is 
shown in Figure 14—1 . Pin assignments are listed in the following tables: 

□ Table 14-1 : Pins sorted by signal name (alphanumeric listing) 

□ Table 14-2: Pins sorted by pin number (location on Figure 14-1) 

□ Table 14-3: Pins sorted by function, describing each (page 14-7) 



Figure 14-1. TMS320C40 Pinout (Bottom View) 
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Table 14-1. TMS320C40 Pin Assignments Sorted by Signal Name 



Signal 


Pin 


AO 


D32 


A1 


B32 


A2 


D30 


A3 


C29 


A4 


B30 


A5 


F28 


A6 


F24 


A7 


E29 


A8 


C27 


A9 


D28 


A10 


B28 


A11 


F26 


A12 


C25 


A13 


E27 


A14 


B26 


A15 


D26 


A16 


C23 


A17 


B24 


A18 


E25 


A19 


C21 


A20 


D24 


A21 


B22 


A22 


E23 


A23 


C19 


A24 


D22 


A25 


B20 


A26 


E21 


A27 


B18 


A28 


C17 


A29 


D20 


A30 


B16 


AE 


AG31 


CODO 


AP4 


COD1 


AL5 


C0D2 


AN5 


C0D3 


AM4 


C0D4 


AP6 


C0D5 


AM6 



Signal 


Pin 


C0D6 


AN7 


C0D7 


AK8 


C1DO 


AL7 


C1D1 


AP8 


C1D2 


AM8 


C1D3 


AK12 


C1D4 


AK10 


C1D5 


AN9 


C1D6 


AL9 


C1D7 


AP10 


C2D0 


AM18 


C2D1 


AN19 


C2D2 


AL19 


C2D3 


AP20 


C2D4 


AM20 


C2D5 


AN21 


C2D6 


AL21 


C2D7 


AP22 


C3D0 


AM22 


C3D1 


AN23 


C3D2 


AL23 


C3D3 


AP24 


C3D4 


AM24 


C3D5 


AN25 


C3D6 


AL25 


C3D7 


AP26 


C4D0 


AN27 


C4D1 


AM26 


C4D2 


AK24 


C4D3 


AL27 


C4D4 


AP28 


C4D5 


AK26 


C4D6 


AN29 


C4D7 


AM28 


C5D0 


AL29 


C5D1 


AP30 


C5D2 


AK28 


C5D3 


AN31 



Signal 



C5D4 
C5D5 
C5D6 
C5D7 



CACKO 



CACK1 



CACK2 



CACK3 



CACK4 



CACK5 



CEO 
CE1 



CRDYO 



CRDY1 



CRDY2 



CRDY3 



CRDY4 



CRDY5 



CREQO 



CREQ1 



CREQ2 



CREQ3 



CREQ4 



CREQ5 



CSTRBO 



CSTRB1 



CSTRB2 



CSTRB3 



CSTRB4 



CSTRB5 



cv S s 
cv S s 

CV S S 
CV S S 
CVss 
CV S S 

cv S s 
cvss 



Pin 



AM30 
AP32 
AM32 
AL31 



AN11 

AN13 

AM14 

AM16 

AK32 

AJ31 



AA33 
V34 



AP12 
AP14 
AL15 
AL17 
AH30 
AH32 



AM10 
AM12 
AN15 
AN17 
AN33 
AL33 



AL11 

AL13 

AP16 

AP18 

AM34 

AK34 



AR19 

AR7 

N1 

AL35 

A27 

A9 

E1 

J35 



Signal 


Pin 


cvss 


E35 


CVss 


AR25 


cvss 


AE1 


cvss 


AR13 


cvss 


A19 


cvss 


R35 


cvss 


AL1 


DO 


U33 


D1 


V32 


D2 


T34 


D3 


U31 


D4 


R33 


D5 


P34 


D6 


T32 


D7 


N33 


D8 


R31 


D9 


M34 


D10 


P32 


D11 


L33 


D12 


N31 


D13 


K34 


D14 


M32 


D15 


J33 


D16 


L31 


D17 


M30 


D18 


K32 


D19 


H34 


D20 


J31 


D21 


G33 


D22 


K30 


D23 


F34 


D24 


H32 


D25 


E33 


D26 


D34 


D27 


G31 


D28 


C33 


D29 


H30 


D30 


E31 



Signal 


Pin 


D31 


F32 


DE 


AA31 


DV D D 


AR11 


DVqd 


AR29 


DV D D 


A13 


DVqd 


A7 


DV DD 


A17 


DV DD 


L35 


DV DD 


AR23 


DV D D 


A29 


DVqd 


L1 


DV DD 


AC1 


DVqd 


AR17 


DVqd 


A23 


DV DD 


AJ1 


DV S S 


AJ35 


DV S S 


A21 


DVSS 


A25 


DV S S 


G35 


DV S S 


A11 


DV S S 


AG1 


DV S S 


AM2 


DV S S 


R1 


DV S S 


AR21 


DV S S 


AR15 


DV S S 


A15 


DV S S 


AR27 


DV S S 


G1 


DV S S 


N35 


DV S S 


AR9 


EMUO 


AA35 


EMU1 


AD34 


GADV DD 


B2 


gadv dd 


AR1 


gadv dd 


U35 


gddv dd 


V2 


gddv dd 


A35 


gddv dd 


A1 
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Table 14-1. TMS320C40 Pin Assignments Sorted by Signal Name (Concluded) 



Signal 


Pin 


H1 


AC3 


H3 


AC5 


IACK 


W3 


IIOFO 


AN3 


IIOF1 


AL3 


IIOF2 


AH6 


HOF3 


AK2 


iv$s 


AR5 


Ivss 


AR31 


IVss 


AG35 


iVsS 


A31 


•vss 


J1 


ivss 


A5 


LAO 


D2 


LA1 


D4 


LA2 


E3 


LA3 


F4 


LA4 


H6 


LA5 


F2 


LA6 


G5 


LA7 


G3 


LA8 


H4 


LA9 


H2 


LA10 


K6 


LA11 


M6 


LA12 


J5 


LA13 


J3 


LA14 


K4 


LA15 


K2 


LA16 


L3 


LA17 


L5 


LA18 


M2 


LA19 


M4 


LA20 


N3 


LA21 


N5 


LA22 


P2 


LA23 


P4 


LA24 


R3 



Signal 


Pin 


LA25 


R5 


LA26 


T2 


LA27 


03 


LA28 


T4 


LA29 


V4 


LA30 


U5 


LADV DD 


B34 


LADVqd 


AB2 


LADVqd 


AP34 


LAE 


AB4 


LCEO 


AG5 


LCE1 


AF2 


LDO 


E19 


LD1 


C15 


LD2 


D18 


LD3 


B14 


LD4 


E17 


LD5 


D16 


LD6 


C13 


LD7 


E15 


LD8 


B12 


LD9 


D14 


LD10 


C11 


LD11 


E13 


LD12 


B10 


LD13 


D12 


LD14 


C9 


LD15 


E11 


LD16 


F12 


LD17 


D10 


LD18 


BB 


LD19 


E9 


LD20 


C7 


LD21 


F10 


LD22 


B6 


LD23 


D8 


LD24 


C5 


LD25 


E7 



Signal 



LD26 
LD27 
LD28 
LD29 
LD30 
LD31 



LDDV DD 
LDDV DD 
LDDV DD 



LDE 



LLOCK 



LOCK 



LPAGEO 
LPAGE1 



LRDYO 



LRDY1 



LR/WO 
LR/W1 



LSTATO 
LSTAT1 
LSTAT2 
LSTAT3 



LSTRBO 



LSTRB1 



NMI 



PAGEO 
PAGE1 



RDYO 



RDY1 



RESETLOCO 
RESETLOC1 



RESET 



ROMEN 



R/WO 
R/W1 



Pin 



B4 
F8 
D6 
C3 
E5 
F6 



AR35 

AP2 

U1 



AD4 



AA5 



W33 



AH2 
AG3 



AF6 
AE5 



AH4 
AF4 



AA3 
Y4 
Y2 
W5 



AJ3 
AD6 



AJ5 



AG33 
AB32 



Y32 
W31 



AF30 
AH34 



AJ33 



AK4 



AF32 
AC31 



Signal 


Pin 


STATO 


AD32 


STAT1 


AE33 


STAT2 


AF34 


STAT3 


AE31 


STRBO 


AD30 


STRB1 


AC33 


SUBS 


C31 


TCK 


Y34 




Abo 


TCLK1 


AD2 


TDO 


AB34 


TDI 


AC35 


TMS 


W35 


TRST 


AE35 


VddL 


AN1 


Vddl 


AN35 


vddl 


C35 


vddl 


C1 


VSSL 


A3 


VSSL 


AR3 


VSSL 


AR33 


VSSL 


A33 


X1 


W1 


X2/CLKIN 


AA1 
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Table 14-2. TMS320C40 Pin Assignments Sorted by Pin Number 



Pin 


Signal 


AO 


GDDVqq 


A3 


VSSL 


A5 


ivss 


A7 


DVqd 


A9 


cvss 


A11 


DVss 


A13 


DVqd 


A15 


DVss 


A17 


DVqd 


A19 


cvss 


A21 


DVss 


A23 


DVqd 


A25 


DV S S 


A27 


cv S s 


A29 


DV DD 


A31 


ivss 


A33 


VSSL 


A35 


GDDVdd 


AA1 


X2/CLKIN 


AA3 


LSTATO 


AA5 


LLOCK 


AA31 


DE 


AA33 


CEO 


AA35 


EMUO 


AB2 


LADV DD 


AB4 


LAE 


AB32 


PAGE1 


AB34 


TDO 


AC1 


DVqd 


AC3 


H1 


ACS 


H3 


AC31 


R/W1 


AC33 


STRB1 


AC35 


TDI 


AD2 


TCLK1 


AD4 


LDE 


AD6 


LSTRB1 



Pin 



AD30 
AD32 
AD34 



AE1 

AE3 

AE5 

AE31 

AE33 

AE35 



AF2 

AF4 

AF6 

AF30 

AF32 

AF34 



AG1 

AG3 

AG5 

AG31 

AG33 

AG35 



AH2 

AH4 

AH6 

AH30 

AH32 

AH34 



AJ1 

AJ3 

AJ5 

AJ31 

AJ33 

AJ35 



AK2 

AK4 

AK8 

AK10 

AK12 



Signal 



STRBO 
STATO 
EMU1 



cv S s 

TCLKO 



LRDY1 
STAT3 
STAT1 



TRST 



LCE1 
LR/W1 



LRDYO 
RESETLOCO 
R/WO 
STAT2 



DV SS 
LPAGE1 
LCEO 
AE 

PAGEO 

ivss 



LPAGEO 

LR/WO 

II0F2 



CRDY4 



CRDY5 
RESETLOC1 



DV DD 



LSTRBO 
NMI 



CACK5 



RESET 
DVss 



IIOF3 

ROMEN 

C0D7 

C1D4 

C1D3 



Pin 



AK24 
AK26 
AK28 
AK32 
AK34 



AL1 

AL3 

AL5 

AL7 

AL9 

AL11 

AL13 

AL15 

AL17 

AL19 

AL21 

AL23 

AL25 

AL27 

AL29 

AL31 

AL33 

AL35 



AM2 

AM4 

AM6 

AM8 

AM10 

AM12 

AM14 

AM16 

AM18 

AM20 

AM22 

AM24 

AM26 

AM28 



Signal 



C4D2 
C4D5 
C5D2 



CACK4 



CSTRB5 



cvss 

IIOF1 
COD1 
C1DO 
C1D6 



CSTRBO 



CSTRB1 



CRDY2 



CRDY3 

C2D2 

C2D6 

C3D2 

C3D6 

C4D3 

C5D0 

C5D7 



CREQ5 
CV S S 



DV S s 
C0D3 
C0D5 
C1D2 



CREQO 



CREQ1 



CACK2 



CACK3 

C2D0 

C2D4 

C3D0 

C3D4 

C4D1 

C4D7 



Pin 



AM30 
AM32 
AM34 



AN1 

AN3 

AN5 

AN7 

AN9 

AN11 

AN13 

AN15 

AN17 

AN19 

AN21 

AN23 

AN25 

AN27 

AN29 

AN31 

AN33 

AN35 



AP2 

AP4 

AP6 

AP8 

AP10 

AP12 

AP14 

AP16 

AP18 

AP20 

AP22 

AP24 

AP26 

AP28 

AP30 

AP32 

AP34 



Signal 



C5D4 
C5D6 



CSTRB4 



VDDL 
IIOFO 
C0D2 
C0D6 
C1D5 



CACKO 



CACK1 



CREQ2 



CREQ3 

C2D1 

C2D5 

C3D1 

C3D5 

C4D0 

C4D6 

C5D3 



CREQ4 
VDDL 



LDDV DD 

CODO 

C0D4 

C1D1 

C1D7 



CRDYO 



CRDY1 



CSTRB2 



CSTRB3 

C2D3 

C2D7 

C3D3 

C3D7 

C4D4 

C5D1 

C5D5 

LADV DD 
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Table 14-2. TMS320C40 Pin Assignments Sorted by Pin Number (Concluded) 



Pin 


Signal 


AR1 


GADV DD 


AR3 


VSSL 


AR5 


ivss 


AR7 


cv S s 


AR9 


DV S S 


AR11 


DV D D 


AR13 


cv S s 


AR15 


DV S S 


AR17 


DVqq 


AR19 


cv S s 


AR21 


DV S S 


AR23 


DV DD 


AR25 


cv S s 


AR27 


DVss 


AR29 


DVnn 


AR31 


ivss 


AR33 


VSSL 


AR35 


LDDVdD 


B2 


GADVdd 


B4 


LD26 


B6 


LD22 


B8 


LD18 


B10 


LD12 


B12 


LD8 


B14 


LD3 


B16 


A30 


B18 


A27 


B20 


A25 


B22 


A21 


B24 


A17 


B26 


A14 


B28 


A10 


B30 


A4 


B32 


A1 


B34 


LADV DD 



Pin 


Signal 


C1 


VDDL 


C3 


LD29 


C5 


LD24 


C7 


LD20 


C9 


LD14 


C11 


LD10 


C13 


LD6 


C15 


LD1 


C17 


A28 


C19 


A23 


C21 


A19 


C23 


A16 


C25 


A12 


C27 


A8 


C29 


A3 


C31 


SUBS 


C33 


D28 


C35 


VnnL 


D2 


LAO 


D4 


LA1 


D6 


LD28 


D8 


LD23 


D10 


LD17 


D12 


LD13 


D14 


LD9 


D16 


LD5 


D18 


LD2 


D20 


A29 


D22 


A24 


D24 


A20 


D26 


A15 


D28 


A9 


D30 


A2 


D32 


AO 


D34 


D26 



Pin 


Signal 


E1 


CVSS 


E3 


LA2 


E5 


LD30 


E7 


LD25 


E9 


LD19 


E11 


LD15 


E13 


LD11 


E15 


LD7 


E17 


LD4 


E19 


LDO 


E21 


A26 


E23 


A22 


E25 


A18 


E27 


A13 


E29 


A7 


E31 


D30 


E33 


D25 


E35 




F2 


LA5 


F4 


LA3 


F6 


LD31 


F8 


LD27 


F10 


LD21 


F12 


LD16 


F24 


A6 


F26 


A11 


F28 


A5 


F32 


D31 


F34 


D23 


G1 


DV S S 


G3 


LA7 


G5 


LA6 


G31 


D27 


G33 


D21 


G35 


DV S S 



Pin 


Signal 


H2 


LA9 


H4 


LA8 


H6 


LA4 


H30 


D29 


H32 


D24 


H34 


D19 


J1 


• v oo 


J3 


LA13 


J5 


LA12 


J31 


D20 


J33 


D15 


J35 


oo 


K2 


LA15 


K4 


LA14 


K6 


LA10 


K30 


D22 


K32 


D18 


K34 


D13 


L1 


DVnn 


L3 


LA16 


L5 


LA17 


L31 


D16 


L33 


D11 


L35 


DVnn 


M2 


LA18 


M4 


LA19 


M6 


LA11 


M30 


D17 


M32 


D14 


M34 


D9 


N1 


cvss 


N3 


LA20 


N5 


LA21 


N31 


D12 


N33 


D7 


N35 


DV S S 



Pin 



P2 
P4 
P32 
P34 



R1 

R3 

R5 

R31 

R33 

R35 



T2 
T4 
T32 
T34 



U1 

U3 

U5 

U31 

U33 

U35 



V2 
V4 
V32 
V34 



W1 

W3 

W5 

W31 

W33 

W35 



Y2 
Y4 
Y32 
Y34 



Signal 



LA22 
LA23 
D10 
D5 



DV S S 

LA24 

LA25 

D8 

D4 

CV S S 



LA26 
LA28 
D6 
D2 



LDDV DD 

LA27 

LA30 

D3 

DO 

GADVdd 



GDDVdd 
LA29 
D1 
CE1 



X1 



IACK 

LSTAT3 

RDY1 



LOCK 
TMS 



LSTAT2 
LSTAT1 



RDYO 
TCK 
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14.2 Signal Descriptions 

This section gives signal descriptions for the TMS320C40 device. 
Table 14-3 lists each signal, the number of pins, function, and operating 
mode(s), i.e., input, output, or high-impedance state as indicated by I, O, or 
Z. All pins label ed NC a re not to be connected by the user. A line over a sig- 
nal name (e.g., RESET) indicates that the signal is active low (true at a logic 
0 level). The signals are grouped according to function. 

Table 14-3. TMS320C40 Signal Descriptions 



Signal 


Pins 


Type* 


Description 


Global Bus External Interface (80 pins) 


D(31-0) 


32 


I/07Z 


32-bit data port of the global external interface 


DE 


1 


I 


Data bus enable signal for the global external interface 


A(30-0) 


31 


O/Z 


31 -bit address port of the global external interface 


AE 


1 


I 


Address bus enable signal for the global bus interface 


STAT(3-0) 


4 


0 


Status signals for the global bus interface 


LOCK 


1 


0 


Lock signal for the global bus interface 


STRBO t 


1 


O/Z 


Access strobe 0 for the global bus interface 


R/WO 


1 


O/Z 


Read/write signal for STRBO accesses 


PAGEO 


1 


O/Z 


Page signal for STRBO accesses 


RDYO 


1 


I 


Ready signal for STRBO accesses 


CEO 


1 


I 


Control enable for the STRBO, PAGEO, and R/WO signals 


STRB1 t 


1 


O/Z 


Access strobe 1 for the global bus interface 


R/W1 


1 


O/Z 


Read/write signal for STRB1 accesses 


PAGE1 


1 


O/Z 


Page signal for STRB1 accesses 


RDY1 


1 


I 


Ready signal for STRB1 accesses 


CE1 


1 


I 


Control enable for the STRB1 , PAGE1, and R/W1 signals 



t STRBO and STRB1 and associated signals (R/W1 , R/WO, PAGEO, PAGE1 , etc.) are effective over the ad- 
dress ranges defined by the STRB ACTIVE bits, as listed in Table 7-3 on page 7-8. 
* I ■ input, O = output, Z = high impedance. 
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Table 14-3. TMS320C40 Signal Descriptions (Continued) 



Signal 


Pins 


Type* 


Description 


Local Bus External Interface (80 pins) 


LD(31-0) 


32 


l/O/Z 


32-bit data port of the local external interface 


LDE 


-, 


I 


Data bus enable signal for the local external interface 


LA(30-0) 


31 


o/z 


31 -bit address port of the local external interface 


LAE 


1 


I 


Address bus enable signal for the local bus interface 


LSTAT(3-0) 




o 


Status signals for the local bus interface 


LLOCK 


1 


0 


Lock signal for the local bus interface 


LSTRBO t 


! 


o/z 


Access strobe 0 for the local bus interface 


LR/WO 


1 


o/z 


Read/write signal for LSTRBO accesses 


LPAGEO 


1 


o/z 


Page signal for LSTRBO accesses 


LRDYO 




I 


Ready signal for LSTRBO accesses 


LCEO 




I 


Control enable for the LSTRBO, LPAGEO, and LR/WO signals 


LSTRB1 t 




o/z 


Access strobe 1 for the local bus interface 


LR/W1 




o/z 


Read/write signal for LSTRB1 accesses 


LPAGE1 




o/z 


Page signal for LSTRB1 accesses 


LRDY1 




I 


Ready signal for LSTRB1 accesses 


LCE1 




I 


Control enable for the LSTRB1 , LPAGE1 , and LR/W1 signals 


Communication Port 0 Interface (12 pins) 


C0D(7-0) 




I/O 


Communication port 0 data bus 


CREQO 




I/O 


Communication port 0 token request signal 


CACKO 




I/O 


Communication port 0 token request acknowledge signal 


CSTRBO 




I/O 


Communication port 0 data strobe signal 


CRDYO 




I/O 


Communication port 0 data ready signal 



t LSTRBO and LSTRB1 and associated signals (LR/W1 , LR/WO, LPAGEO, LPAGE1 , etc.) are effective over 

the address ranges defined by the STRB ACTIVE bits, as listed in Table 7-3 on page 7-8. 
* I = input, O = output, Z = three-stated (high impedance). 
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Table 14-3. TMS320C40 Signal Descriptions (Continued) 



Signal 


Pins 


Type* 


Description 


Communication Port 1 1nterface (12 pins) 


C1D(7-0) 


8 


I/O 


Communication port 1 data bus 


CREQ1 


1 


I/O 


Communication port 1 token request signal 


CACK1 


1 


I/O 


Communication port 1 token request acknowledge signal 


CSTRB1 


1 


I/O 


Communication port 1 data strobe signal 


CRDY1 


1 


I/O 


Communication port 1 data ready signal 


Communication Port 2 Interface (12 pins) 


C2D(7-0) 


8 


I/O 


Communication port 2 data bus 


CREQ2 


1 


I/O 


Communication port 2 token request signal 


CACK2 


1 


I/O 


Communication port 2 token request acknowledge signal 


CSTRB2 


1 


I/O 


Communication port 2 data strobe signal 


CRDY2 


1 


I/O 


Communication port 2 data ready signal 


Communication Port 3 Interface (12 pins) 


C3D(7 - 0) 


8 


I/O 


Communication port 3 data bus 


CREQ3 


1 


I/O 


Communication port 3 token request signal 


CACK3 


1 


I/O 


Communication port 3 token request acknowledge signal 


CSTRB3 


1 


I/O 


Communication port 3 data strobe signal 


CRDY3 


1 


I/O 


Communication port 3 data ready signal 


Communication Port 4 Interface (12 pins) 


C4D(7-0) 


8 


I/O 


Communication port 4 data bus 


CREQ4 


1 


I/O 


Communication port 4 token request signal 


CACK4 


1 


I/O 


Communication port 4 token request acknowledge signal 


CSTRB4 


1 


I/O 


Communication port 4 data strobe signal 


CRDY4 


1 


I/O 


Communication port 4 data ready signal 


Communication Port 5 Interface (12 pins) 


C5D(7 - 0) 


8 


I/O 


Communication port 5 data bus 


CREQ5 


1 


I/O 


Communication port 5 token request signal 


CACK5 


1 


I/O 


Communication port 5 token request acknowledge signal 


CSTRB5 


1 


I/O 


Communication port 5 data strobe signal 


CRDY5 


1 


I/O 


Communication port 5 data ready signal 



* I = input, O = output, Z = three-stated (high impedance). 
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Table 14-3. TMS320C40 Signal Descriptions (Continued) 



Signal 


Pins 


Type* 


Description 


Interrupts, I/O Flags, Reset, Timer (12 pins) 


IIOF(3-0) 


4 


I/O 


Interrupt and I/O flags 


NMl 




I 


Nonmaskable interrupt. It is sensitive to a low-going edge. 


IACK 




O 


Interrupt acknowledge 


RESET 




I 


Reset signal 


RESETLOC(1 ,0) 




I 


Reset-vector location pins 


ROMEN 




I 


On-chip ROM enable (0 = disable, 1 = enable) 


TCLKO 




I/O 


Timer 0 pin 


TCLK1 




I/O 


Timer 1 pin 


Clock and Power (4 pins) 


X1 




0 


Crystal pin 


X2/CLKIN 




I 


Crystal/oscillator pin 


H1 




o 


H1 clock 


H3 




o 


H3 clock 


Emulation (7 pins) 


TCK 




I 


JTAG test port clock 


TDO 




on 


JTAG test port data out 


TDI 




i 


JTAG test port data in 


TMS 




i 


JTAG test port mode select 


TRST 




i 


JTAG test port reset 


EMUO 




I/O 


Emulation pin 0 


EMU1 




I/O 


Emulation pin 1 



* I = input, O = output, Z = three-stated (high impedance). 
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14.3 TMS320C4X Mechanical Data 

Figure 14-2. TMS320C40 325-Pin PGA Dimensions 
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Notes: Dimensions are in inches. 
Package designator: GF. 
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14.4 Electrical Specifications 

Table 14-4. Absolute Maximum Ratings Over Specified Temperature Range 



Condition/Characteristic 


Range 


Supply voltage range, Vdd 


-0.3Vto7V 


Input voltage range 


-0.3Vto7V 


Output voltage range 


- 0.3 V to 7 V 


Operating case temperature range 


0°Cto85°C 


Storage temperature range 


-55°Cto150 °C 



Notes: 1 ) Stresses beyond those listed under "Absolute Maximum Ratings" may cause permanent 
damage to the device. This is a stress rating only; functional operation of the device at 
these or any other conditions beyond those indicated in the "Recommended Operating 
Conditions" table of this specification is not implied. Exposure to absolute-maximum- 
rated conditions for extended periods may affect device reliability. 
2) All voltage values are with respect to Vss- 



Table 14-5. Recommended Operating Conditions 



Parameter 


Min Nom Max 


Unit 


Vdd Supply voltages (DDVdd. etc.) 


4.75 5 5.25 


V 


V$s Supply voltages (CVss. ete -) 


0 


V 


Vm High-level input voltage 


2 Vqd + 0.3 


V 


V|l Low-level input voltage 


-0.3 0.8 


V 


lOH High-level output current 


-300 


HA 


Iql Low-level output current 


2 


mA 


T Operating free-air temperature 


0 85 


°C 


Vjh CLKIN high-level input voltage for CLKIN 


2.6 Vqd + 0-3 


V 



Note: Note 1 for Table 14-4 also applies to this table. All inputs and output voltages are 
TTL compatible. 
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Table 14-6. Electrical Characteristics Over Specified Free-Air Temperature Range 



Electrical Characteristic 


Mln 


Nom(Note 1) 


Max 


Unit 


VoH H, *9 h -l eve l output voltage ( Vqd - M ^ n » kDH = Max ) 


2,4 


3 




V 


Vol Low-level output voltage ( Vqd = Min, Iql 88 Max) 




0.3 


0.6 


V 


\Z Three-state current ( Vqd = Max ) 


-20 




20 


HA 


l| Input current ( V| = Vss to Vqd) 


-10 




10 


HA 


l|P Input current ( Inputs with internal pull-ups) (See 
Note 4) 


-400 




20 


HA 


ICC Supply current ( Ta = 25 °C, Vqd - Max » fx - Max ) 




350 


850 


mA 


C| Input capacitance 


15 


PF 


Cq Output capacitance 


15 


PF 



Notes: 1 ) All nominal values are at Vqd = 5 V, Ta = 25 °C. 

2) f x is the input clock frequency. The maximum value is 50 MHz. 

3) All input and output voltage levels are TTL compatible. 

4) Pins with internal pull-up devices: TDI, TCK. 

5) Pin with internal pull-down device: TRST. 



Figure 14-3. Test Load Circuit 



Tester Pin 
Electronics 



v Load 




Output 
« Under 
Test 



Where: Iql = 2.0 mA (all outputs) 

Iqh = 300 H A (all outputs) 
V L oad = 2.15 V 

C j = 80 pF typical load circuit capacitance. 
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14.5 Signal Transition Levels 

TTL-level outputs are driven to a minimum logic-high level of 2.4 volts and 
to a maximum logic-low level of 0.6 volt. Output transition times are speci- 
fied as follows. 

For a high-to-low transition on a TTL-compatible output signal, the level at 
which the output is said to be no longer high is 2.0 volts, and the level at 
which the output is said to be low is 1 .0 volt. For a low-to-high transition, the 
level at which the output is said to be no longer low is 1 .0 volt, and the level 
at which the output is said to be high is 2.0 volts. 



Figure 14-4. TTL-Level Outputs 




2.4 V 
2.0 V 



i.ov 

0.6 V 



Transition times for TTL-compatible inputs are specified as follows. For a 
high-to-low transition on an input signal, the level at which the input is said 
to be no longer high is 2.0 volts, and the level at which the input is said to 
be low is 0.8 volt. For a low-to-high transition on an input signal, the level 
at which the input is said to be no longer low is 0.8 volt, and the level at which 
the input is said to be high is 2.0 volts. 



Figure 14-5. TTL-Level Inputs 




2.0 V 
90% 



10% 
0.8 V 
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Figure 14-6. X2/CLKIN Timing 
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— Timing parameter table on next page — 



Figure 14-7. H1/H3 Timing 
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Table 14-7. Timing Parameters for CLKIN, H1, H3 (Figure 14-6 and Figure 14-7) 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


III. ■ 

Min Max 


Min Max 


(1) 


tf(CI) 


CLKIN fall time 


4 


A 

4 


ns 


(2) 


tw(CIL) 


CLKIN low pulse duration 
KiiOl) — 11,1,1 


7 


8 


ns 


(3) 


tw(CIH) 


CLKIN hinh nulc&A duration 
tc(CI) - ™n 


7 


8 


ns 


(4) 


l r(CI) 


CLKIN rise time 


4 


4 


ns 


(5) 


tc(CI) 


CLKIN cycle time 


20 


25 


ns 


(6) 


l T(M) 


H1/H3 fall time 


3 


3 


ns 


(7) 


twfHI \ 
w\nL/ 


H1/H3 low pulse duration 


P-6t 


P-6t 


ns 


(8) 


tw(HH) 


H1/H3 high pulse duration 


P-7t 


P-7t 


ns 


(9) 


l r(H) 


H1/H3 rise time 


4 


4 


ns 


(9.1) 


td(HL-HH) 


Delay from H1(H3) low to 
H3(H1)high 


0 5 


0 5 


ns 


(10) 


tc(H) 


H1/H3 cycle time 


40 485 


50 500 


ns 



P - *c(CI) as shown in Figure 14-6. 
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Figure 14-8. Memory ((L)STRB » 0) Read 

\ / \ /~ 




I— -(8.1) 



fa6/e /4-& 77m//7gr Parameters for a Memory (L)STRB = 0) Read/Write 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


(D 


td(H1L-(L)SL) 


H1 low to (L)STRBIow 


7 


7 


ns 


(2) 


*d(H1L-(L)SH) 


H1 low to (L)STRB high 


7 


7 


ns 


(3) 


td(HIH-RWL) 


H1 high to (L)R/W low 


7 


7 


ns 


(4) 


td(H1L-A) 


H1 low to (L)A valid 


7 


11 


ns 


(5) 


tsu(D)R 


(L)D valid before H1 low (read) 


12 


13 


ns 


(6) 


th((L)D)R 


(L)D hold time after H1 low (read) 


0 


0 


ns 


(7) 


tsu(L)(RDY) 


(L)RDY valid before H1 low 


20 


20 


ns 


(8) 


*h((L)RDY) 


(L)RDY hold time after H1 low 


0 


0 


ns 


(8.1) 


td(H1L-S) 


H!lowto(L)STAT(3-0) valid 


7 


11 


ns 



Note: For consecutive reads, (L)R/W stays high and (L)STRB stays low. 

— Table continued on next page — 
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Figure 14-9. Memory ((L)STRB = 0) Write 

H3 




Table 14-8. Timing Parameters for a Memory ((L)STRB = 0) Read/Write (Concluded) 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


(9) 


td(H1H-(L)RWH) 


H1 high to (L)R/W high (write) 


7 


7 


ns 


(10) 


M(L)D)W 


(L)D valid after H1 low (write) 


16 


16 


ns 


(11) 


*h((L)D)W 


(L)D hold time after H1 high 
(write) 


0 


0 


ns 


(12) 


tyMH-A) 


H1 high to A valid on back-to- 
back write cycles (write) 


13 


15 


ns 



Note: The delay for (L)RDY to become active after the address is valid should be a maximum of 
1 3 ns for the 'C40 and 1 9 ns for the 'C40-40. 
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Figure 14-10. DE f AE, and CE Enable Timing 

/ 



(L)DE 
(L)D(0-31) 



(L)AE 
(L)A(0-30) 



(L)CE(0,1) 
(L)R/W(0,1) 



(L)STRB(0,1) 
(L)PAGE(0,1) 




Table 14-9. DE t AE, and CE Enable Timing 



No. 



Name 



Description 



TMS320C40 



Min 



Max 



TMS320C40-40 



Min 



Max 



Unit 



(D 



*d(DEH-PZ) 



Time (L)DE high to (L)D(0-31) Hl-Z 



15 



15 



ns 



(2) 



td(PEL-PV) 



Time (L)DE low to (L)D(0-31) valid 



15 



15 



ns 



(3) 



td(AEH-AZ) 



Time (L)AE high to (L)A(0-31) Hl-Z 



15 



15 



ns 



(4) 



td(AEL-AV) 



Time (L)AE low to (L)A(0-31) valid 



15 



15 



ns 



(5) 



td(CEH-RWZ) 



Time (L)CE high to (L)R/W(0 t 1) Hi-Z 



15 



15 



ns 



(6) 



td(CEL-RWV) 



Time (L)CE low to (L)R/W(0, 1 ) valid 



15 



15 



ns 



(7) 



td(CEH-STRBZ) 



Time (L)CE high to (L)STRB(0,1) in 
high impedance state 



15 



15 



ns 



(8) 



td(CEL-STRBV) 



Time (L)CE low to (L)STRB(0,1) valid 



15 



15 



ns 



(9) 



td(CEH-PAGEZ) 



Time (L)CE high to (L)PAGE(0,1) 
in high impedance state 



15 



15 



ns 



(10) 



td(CEL-PAGEV) 



Time (L)CE low to (L)PAGE(0,1) valid 



15 



15 



ns 
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Figure 14-11. Timing for (L)LOCK When Executing LDFIor LDII 
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Table 14-10. Timing Parameters for (L)LOCK When Executing LDFI or LDII 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


(D 


td(H1L-LOCKU 


H1 lowto(L)LOCKIow 


7 


11 


ns 
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Figure 14-12. Timing for (L)LOCK When Executing a STFI or STII 



STFI or STII 
external access 



(L)STRB 
(L)R/W 
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(L)RDY 
(L)LOCK 
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Table 14-11. Timing Parameters for (L)LOCK When Executing STFI or STII 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


(1) 


kKHML-LOCKH) 


H1 lowto(L)LOCK high 


7 


11 


ns 
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Figure 14-13. Timing for(L)LQCK When Executing SIGI 
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Table 14-12. Timing Parameters for (L)LOCK When Executing SIGI 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


(D 


td(HIL-LOCKL) 


H1 lowto(L)LOCKhigh 


7 


11 


ns 


(2) 


td(HIL-LOCKH) 


H1 lowto(L)LOCK high 


7 


11 
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Figure 14-14. Timing Parameters for (L)PAGE(0, 1) 

(L)R/W1 




(L)STAT3-STAT0 



Table 14-13. Timing Parameters for (L)PAGE(0, 1) During Memory Accesses to a Different Page 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


(1) 


td(H1L-PH) 


H1 low to PAGE high for access to 
different page 


0 7 


11 


ns 


(2) 


td(H1L-PL) 


H1 low to PAGE low for access to 
different page 


0 7 


11 


ns 
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Figure 14-15. Timing for Loading IIF Register (IIOF Pins) When Configured as an Output Pin 
Fetch Load 

| instruction | Decode | Read | Execute | 




Table 14-14. Timing Parameters for Loading IIF Register When Configured as an Output Pin 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


(D 


tv(H1L-IR 


H1 low to IIOF valid 


11 


12 


ns 
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Figure 14-16. Change ofllOFFrom Output to Input Mode 

Buffers Go 




Table 14-15. Timing Parameters of IIOF Changing From Output to Input Mode 









TMS320C40 


TMS320C40-40 




No. 


Name 


Description 


Min Max 


Min Max 


Unit 


(D 


th(H1L-IF01) 


IIOF hold after H1 low 


11 


12 


ns 


(2) 


tsu(IF) 


IIOF setup before H1 low 


8 


8 


ns 


(3) 


twin 


IIOF hold after H1 low 


0 


0 


ns 
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Figure 14-17. Change of IIOF From Input to Output Mode 



Execution of 
Load of IOF 



H3 
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Table 14-16. Timing Parameters of IIOF Changing from Input to Output Mode 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


(1) 


*d(H1L-XFIO) 


H1 low to IIOF switching from input 
to output 


16 


16 


ns 
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Figure 14-18. RESET Timing 

CLKIN AAAAAAAAAAAAAAAAAA 

RESET \ 
(Notes 5, 6) v 

H1 
H3 



(L)D 
(Notet) 

(L)A 

* (Note 2) 

§, / Control Sig- 
<7> J nals(Note3) 

1 1 (L)PAGE(0,1) 
o ^ (Note 3) 



i 1 



IACK 




Asynchronous Reset 
Signals (Note 4) 

Asynchronous Reset 
Signals (Note 5) 

Notes: 1) (L)D includes D(31 - 0), LD(31 - 0), and CxD(7 - 0). 
2) L(A) includes A (30-0). 



ro 



3) Control signals LSTRBO, LSTRB1, STRBO, STRB1, (L)STAT(3 - 0), (L)LOCK, (L)R/W0, and (L)R/W1 go high while (L)PAGEO, and 
(L)PAGE1 go low. 

4) Asynchronously reset signals tha t go int o high impedance afte r RESE T goes low include TCLKO, TCLK1, IIOF(3-0), and the 
communication port control signals CREQx, CACKy, CSTRBy, and CRDYx (where x = 0, 1 , or 2, and y = 3, 4, or 5). (At reset, ports 0, 1 , 
and 2 become outputs, and ports 3, 4, and 5 become inputs.).^ 

5) Asynchronously reset signals that go to a high logic level after RESET goes low include CREQy, CACKx, CSTRBx, and CRDYy (where 
x = 0, 1 , or 2, and y = 3, 4, or 5). 

6) RESET is an asynchronous input and can be asserted at any point during a clock cycle. If the specified timings are met, the exact sequence 
shown will occur; otherwise, an additional delay of one clock cycle may occur. 
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Table 14-17. Timing Parameters for RESET (Figure 14-18) 









TMS320C40 


TMS320C40-40 




No. 


Name 


Description 


Min 


Max 


Min 


Max 


Unit 


(1) 


tsu(RESET) 


Setup for RESET before 

C/LKIN IOW 


8 


Pt 


8 


Pt 


ns 


(2 1) 


td(CLKINH-HIH) 


CLKIN high to H1 high 


5 


12 


5 


13 


ns 


(2 2) 


td(CLKINH-HIL) 


/•x i i/iil i a _. 11.4 i.— .... 

CLKIN high to H1 low 


5 


12 


5 


13 


ns 


\°l 


tsu(RESETH-HIL) 


Setup for RESET high 
before H1 low and after 
10 H1 clock cycles 


8 


8 


ns 




td(CLKINH-H3L) 


CLKIN high to H3 low 


5 


12 


5 


13 


ns 


(4.2) 


*d(CLKINH-H3H) 


CLKIN high to H3 high 


5 


12 


5 


13 


ns 


(5) 


tdis(HIH-XD) 


H1 high to (L)D high-im- 
pedance 


15 


15 


ns 


(6) 


*dis(H3H-XA) 


H3 high to (L)A 
high-impedance 


15 


15 


ns 


(7) 


td(H3H-CONTROLH) 


H3 high to control signals 
high (low for (L)Page) 


7 


7 


ns 


(8) 


td(HIH-IACKH) 


H1 high to IACK high 


7 


7 


ns 


(9) 


*dis(RESETL-ASYNCH) 


RESET low to asynchron- 
ously reset signals high- 
impedance 


15 


15 


ns 


(10) 


^(RESETL-COMMH) 


RESET low to asynchron- 
ously reset signals high 


10 


10 


ns 



t P = t C ( C |), the CLKIN period as shown in Figure 14-6. 
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Figure 14-19. I00F(3 — 0) Interrupt Response Timing 
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Table 14-18. Timing Parameters forlOOF(3— 0) 









TMS320C40 


TMS320C40-40 




No. 


Name 


Description 


Min Typ Max 


Min Typ Max 


Unit 


(D 


tsu(IOOF) 


IOOF(3 — 0) setup before H1 
low 


11 


12 


ns 


(2) 


tw(IOOF) 
(See Notel) 


Interrupt pulse width to 
guarantee one interrupt seen 


P 1.5P <2P 


P 1.5P <2P 


ns 



Notes: 1 ) Interrupt pulse width must be at least 1 P wide (P = one H1 period) to guarantee it will 
be seen. It must be less than 2 P wide to guarantee it will be responded to only once. 
Recommended pulse width is 1 .5 P. 



2) IOOF is an asynchronous input and can be asserted at any point during a clock cycle. 
If the specified timings are met, the exact sequence shown will occur; otherwise, an addi- 
tional delay of one clock cycle may occur. 

3) The 'C40 can accept an interrupt from the same source every two H1 clock cycles. 

4) For edge-triggered interrupts, only timing number (1) applies. 
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Figure 14-20. lACK Timing 




Table 14-19. Timing Parameters for IACK 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


(1) 


td(MH-IACKL) 


H1 high to IACK low 


7 


7 


ns 


(2) 


*d(H1 H-IACKH) 


H1 high to IACK high during first 
cycle of IACK instruction data read 


7 


7 


ns 



Note: The IACK output is active for the entire duration of the bus cycle and is therefore extended 
if the bus cycle utilizes wait states. 
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Figure 14-21 Communication-Port Word-Transfer Cycle Timing 



Timing 
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CACK 



CSTRB 



CD(7-0) 




CRDY 




\_r 



Note: For correct operation during token exchange, the two communicating 'C40s must have CLKIN 
frequencies within a factor of 2 of each other (in other words, at most, one of the 'C40s can be 
twice as fast as the other). 



Table 14-20. Communication-Port Word-Transfer Cycle Timing 



No. 


Name 


Description 


TMS320C40* 


TMS320C40-40* 


Unit 


Mint Maxt 


Mint Maxt 


(1) 


tWORD 


Word transfer period 
(4 bytes = 1 word) 


1.5P + 46 2.5P + 202 


1.5P + 46 2.5P + 202 


ns 


(2) 


td(RL-SL)W 


CRDY low to CSTRB low 
between back-to-back write 
cycles 


1.5P + 7 2.5P + 28 


1.5P + 7 2.5P + 28 


ns 



t P is the duration of the H1 clock period with a minimum value of 40 ns (P > 40 ns). 

* For these timing values, it is assumed that the 'C40 receiving data is ready to receive data. 
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Figure 14-22. Communication Port Byte Timing (Write and Read) 
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(a) Write Timing 
Table 14-21. Communication Port Byte Timing (Write and Read) 



(b) Read Timing 









TMS320C40 


TMS320C40-40 




No. 


Name 


Description 


Min 


Max 


Min 


Max 


Unit 


d) 


tsu(CD)W 


Data valid before CSTRB (write) 


2 


2 


ns 


(2) 


td(RL-SH)W 


CRDY low to CSTRB high (write) 


3 


15 


3 


15 


ns 


(3) 


th(CD)W 


CD hold after CRDY low (write) 


2 


2 


ns 


(4) 


td(RH-SL)W 


CRDY high to CSTRB low for 
subsequent bytes (write) 


3 


15 


3 


15 


ns 


(5) 


tBYTE 


Byte period 


12 


54 


12 


54 


ns 


(6) 


td(SL-RL)R 


CSTRB low to CRDY low (read) 


3 


12 


3 


12 


ns 


(7) 


%u(CD)R 


CD valid after CSTRB (read) 


0 


0 


ns 


(8) 


th(CD)R 


CD held valid after CRDY low (read) 


0 


0 


ns 




td(SH-RHm 


CSTRB high to CRDY high (read) 


3 


12 


3 


12 


ns 
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Figure 14-23. Communication Token Transfer Sequence From an Input to an Output Port 



CREQ 



CACK 



CSTRB 



CD(7-0) 



CRDY 



(3) 



\ 



(2)" 



CACK AN iNPOT 



CSTRB AN INPUT 



CW?~0) ARE INPUTS 



t 



14) 



ri CACK AN OUTPUT 

(iH_T — (5) " 



W CSTRB AN OUTPUT " ^ j 



i (4.D— h 




+— (6) h 



Valid data 
out 



CWYANMPVT 



I (4-2)—! K 



= When signal is an input (clear = when signal is an output). 



Note: Before the token exch ange, CREQ an d CRDY are output signals asserted by the 'C40 
that is receiving data. CACK, CSTRB, and CD(7-0) are input signals asserted by the 
device sending data to the 'C40; these are asynchronous with respect to the H 1 clock of 
the receiving 'C40. A fter token e xchange, CACK, CSTRB, and CD(7-0) become output 
signals, and CREQ and CRDY become inputs. 



— Timing parameter table on next page • 
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Table 14-22. Communication Token Transfer Sequence From an Input to an Output Port 
(Figure 14-23) 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Mint Maxt 


Mint Maxt 


V '/I 




CAGKIowtoCSTRB 

chanod from innut to a hinh- 
wi icu iyo ii win iii|<^ui iw ci i uyi 1 

level output 


0 5P+ 6 1 5P+22 


0 5P+ 6 1 5P+ 22 


ns 


(2)t 


ts-4/AI DAU\T 


CACK low to start of CREQ 
going high for token request 
acknowledge 


P + 5 2P + 20 


P + 5 2P + 20 


ns 


(3) 


tH/ROM— RHI\T 
^nvjn— rtvjl; l 


Start of CREQ going high to 
CREQ change from output to 
an input 


0.5P-5 0.5P+13 


0.5P-5 0.5P+13 


ns 


(4) 


tnVROH-AOVT 


Start of CREQ going high to 
CACK change from an input 
to an output level high 


0.5P-5 0.5P+13 


0.5P-5 0.5P+13 


ns 


(4.1) 


toVROH-DfNT 


Start of CREQ going high to 
CD(7-0) change from inputs 
driven to outputs driven 


0.5P-5 0.5P+13 


0.5P-5 0.5P+13 


ns 


(4.2) 


tnVROH-RIVT 


Start of CREQ going high to 
CRDY change from an out- 
put to an input 


0.5P-5 0.5P+13 


0.5P-5 0.5P+13 


ns 


(5) 


td(RQH-SL)T 


Start of CREQ going high to 
CSTRB low for start of word 
transfer out 


1.5P-8 1.5P+9 


0.5P-8 1.5P+9 


ns 


(6) 


td(RL-SL)T 


CRDY low at end of word 
input to CSTRB low for word 
output 


3.5P+12 5.5P+48 


3.5P+12 5.5P+48 


ns 



t These timing parameters result from synchronizer delays and are referenced from the falling 
edge of H1 . The inputs (that cause the output-signal pins to change values) are sampled on H1 
falling. The minimum delay occurs when the input condition occurs just before H1 falling, and the 
maximum delay occurs when the input condition occurs just after H1 falling. 
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Figure 14-24. Communication Token Transfer Sequence From an Output to an Input Port 
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CREQ 



CACK 



CSTRB 



I 
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(8)- 



(D- 



CACKAN 



CD(7-0) Valid data ^ Valid data~~} j( 



(5)' 



OUTPUT 



(6)— J 



CSTRB AN OUTPUT 



•{-(3) 



(7)- 



CREQ AhN OUTPUT 



CACK AN INPUT 



CSTRB AN tNPVT 



CO(T~0} ABB MPUTS 



-(4) 



CRDY 



CRDY AN OUTPUT 



(2) 



= When signal is an input (clear = when signal is an output). 



Note: Before the to ken exc han ge, CA CK, CSTRB, and CD(7-0) are asserted by the 'C40 
sending data. CREQ and CRDY are input signals asserted by the 'C40 receiving data 
and are asynch ron ous with respect to the H 1 clo ck of the sending 'C40. After token ex- 
change, CREQ and CRDY become outputs, and CSTRB, CACK, and CD(7-0) become 
inputs. 



— Timing parameter table on next page — 
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Table 14-23. Communication Token Transfer Sequence From an Output to an Input Port 
(Figure 14-24) 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


V 1 / 1 


Wqi~1I At \T 
'u^rlWL— ML.; 1 


CREQ low to start of CACK 
going low for token request 
acknowledge 


P + 5 2P + 22 


P + 5 2P + 22 


ns 


(2)t 


tdfRL-AUT 


CRDY low at end of word 
transfer out to start of CACK 
going low 


P + 6 2P + 27 


P + 6 2P + 27 


ns 


(3) 


*d(AL-CD)l 


Start of CACK going low to 
CD(7-0) change from 
outputs to inputs 


0.5P - 8 0.5P + 8 


0.5P-8 0.5P + 8 


ns 


(4) 


td(AL-RO)T 


Start of CACK going low to 
CRDY change from an input 
to output, high level 


0.5P - 8 0.5P - 8 


0.5P-8 0.5P-8 


ns 


(5)t 


*d(RQH-AQ)T 


CREQ high to CREQ 
change from an input to 
output, high level 


4 22 


4 22 


ns 


(6)t 


td(RQH-AI)T 


CREQ high to CACK change 
from output to an input 


4 22 


4 22 


ns 


(7)t 


*d(RQH-SI)T 


CREQ high to CSTRB 
change from output to an 
input 


4 22 


4 22 


ns 


(8)t 


td(RQH-RQL)T 


CREQ high to CREQ low for 
the next token request 


P-4 2P + 8 


P-4 2P + 8 


ns 



t These timing parameters result from synchronizer delays and are referenced from the falling 
edge of H1 . The inputs (that cause the output-signal pins to change values) are sampled on H1 
falling. The minimum delay occurs when the input condition occurs just before H1 falling, and the 
maximum delay occurs when the input condition occurs just after H1 falling. 
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Figure 14-25. Timer Pin Timings 




m-H K- | I — I" 

)C±XZZI5CZZ>-^— CXD( 



7a6fe Timing Parameters for Timer Pin 









TMS320C40 


TMS320C40-40 




No. 


Name 


Description 


Min Max 


Min Max 


Unit 


d) 


tsu(TCLKHIL) 


TCLK setup before H1 low 


9 


10 


ns 


(2) 


th(TCLKHIL) 


TCLK hold after H1 low 


0 


0 


ns 


(3) 


td(TCLKHIH) 


TCLK valid after H1 high 


7 


7 


ns 



Note: Period and polarity of valid logic level are specified by contents of internal control registers. 
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Figure 14-26. JTAG Emulation Timings 



I h- P)-»} 

TMS/TDI l ^ \ \C 



TDO 



I (2)1 

(3) ' 



5t 



Table 14-25. Timing Parameters for JTAG Emulation 



No. 


Name 


Description 


TMS320C40 


TMS320C40-40 


Unit 


Min Max 


Min Max 


d) 


tsu(TMS-TCKH) 


TMS/TDI setup to TCK high 


10 


10 


ns 


(2) 


*h (TMS/TDI) 


TMS/TDI hold from TCK high 


5 


5 


ns 


(3) 


td(TCKL-TDOV) 


TCK low to TDO valid 


0 15 


0 15 


ns 
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TMS320C4X Sockets 



This appendix describes sockets available to accept the TMS320C4x pin 
grid array (PGA). Both sockets covered in this appendix feature zero inser- 
tion force (ZIF): 

□ a tool-activated ZIF socket (TAZ) 

□ a handle-activated ZIF socket (HAZ). 

The sockets described herein are manufactured by AMP Incorporated®. 
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Tool Activated ZIF PGA Socket 




This socket requires AMP™ actuator tool: 354234-1 
Description: 

□ AMP part number: 382533-9 

□ pin positions: 325 

□ soldertail length: 0.1 70 in. for PC boards 0.1 25 in. thick (other tail 

lengths available) 

Features: 

□ slightly larger than PGA device 

□ easy package loading because of large funnel entry 

□ zero insertion force 

□ contact wiping action during insertion ensures clean contact points 

□ spring-loaded cover ensures proper loading 

□ can be used with robotic insertion and removal 

□ its horizontal socket forces (vs. vertical) prevent damage to device 
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Handle Activated ZIF PGA Socket 



A.2 Handle-Activated ZIF PGA Socket (HAZ) 

Figure A-2. Handle-Activated ZIF Socket 




Description: 

□ AMP part number: 

□ pin positions: 

□ solder tail length: 

□ Dimensions: 



382320-9 
325 

0.170 in. for pc boards 0.125 in. thick (other tail 
lengths available) 

Height: 0.350 inch maximum to device 
plane and 0.650 inch to top of 
handle in closed position 

Width: 2.700 by 2.875 inches maximum 
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Handle Activated ZIF PGA Socket 



Features: 

□ can be used for test and burn-in 

□ spring contacts are normally closed 

□ easy package loading because of large funnel entry 

□ zero insertion force 

□ contact wiping action during socket closing ensures clean contact 
points 

□ operating temperature is 1 60° C (burn-in capability) 
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Appendix B 

XDS510 Design Considerations 



The information in this document is assist you in meeting the design require- 
ments of the XDS510 emulator. This information supports XDS510 Cable 
no. 2563988-001 , rev B. 

The TMS320C4x family supports emulation through a dedicated emulation 
port. The emulation port is a superset of the IEEE 1149.1 (JTAG) standard 
and can be accessed by the XDS51 0 emulator. For details on the JTAG pro- 
tocol, refer to the IEEE 1149.1 specification. 

This appendix contains the following sections: 



Section Page 

B.1 Header Signals B-2 

B.2 Bus Protocol B-3 

B.3 Cable Pod B-4 

B.4 Test Clock Generated in Test System B-7 

B.5 Processor Configuration B-8 

B.6 Emulation Timing Calculations B-11 
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B.1 Header and Header Signals 

To perform emulation with the XDS510, your target system must have a 
14-pin header (two 7-pin rows) with connections as shown in Figure B-1. 
Table B-1 describes the emulation signals. 

Although you can use other headers, recommended parts include: 

Straight header, unshrouded 



Right-angle header, unshrouded 



DuPont Electronics™ part num- 
ber 67996-114 

DuPont Electronics™ part num- 
ber 68405-114 



Figure B-1. 14-pin Header Signals and Header Dimensions 



TMS 
TDI 
PD (+5 V) 
TDO 
TCK_RET 
TCK 
EMUO 



1 


2 


3 


4 


5 


E 


7 


8 


9 


10 


11 


12 


13 


14 



TRST 
GND 

No pin (key) 

GND 

GND 

GND 

EMU1 



Header Dimensions: 

Pin-to-pin spacing: 0.100 in. (X,Y) 
Pin width: 0.025 in., square post 
Pin length: 0.235 in., nominal 



Table B-1. 14-Pin Header Signal Description 



XDS510 
Signal 


+XDS510 
State 


tTarget 
State 


Description 


TMS 


0 


I 


JTAG test mode select 


TDI 


0 


I 


JTAG test data input 


TDO 


I 


O 


JTAG test data output 


TCK 


o 


I 


JTAG test clock. TCK is a 1 0-MHz clock source from the 
emulation cable pod. This signal can be used to drive the 
system test clock. 


TRST 


0 


I 


JTAG test reset 


EMUO 


I 


I/O 


Emulation pin 0 


EMU1 


I 


I/O 


Emulation pin 1 


PD 


I 


O 


Presence detect. Indicates that the emulation cable is con- 
nected and that the target is powered up. PD should be tied 
to +5 volts in the target system. 


TCK.RET 


I 


0 


JTAG test clock return. Test clock input to the XDS510 
emulator. May be a buffered or unbuffered version of TCK. 



t I = input; O ■ output 
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Bus Protocol 



B.2 Bus Protocol 

The IEEE 1 1 49. 1 specification covers the requirements for JTAG bus slave 
devices (such as the TMS320C4x family) and provides certain rules. Those 
rules are summarized as follows: 

□ The TMS/TDI inputs are sampled on the rising edge of the TCK signal 
of the device. 

□ The TDO output is clocked from the falling edge of the TCK signal of the 
device. . 

When JTAG devices are daisy-chained together, the TDO of one device has 
approximately a half TCK cycle set up to the next device's TDI signal. This 
type of timing scheme minimizes race conditions that would occur if both 
TDO and TDI were timed from the same TCK edge. The penalty for this tim- 
ing scheme is a reduced TCK frequency. 

The IEEE 1149.1 specification does not provide rules for JTAG bus master 
(XDS51 0) devices. Instead, it states that it expects a bus master to provide 
bus slave compatible timings. The XDS51 0 provides timings that meet the 
bus slave rules and also provides an optional timing mode that allows you 
to run the emulation at a much higher frequency for improved performance 
by avoiding the timing penalty dexcribed herein. 
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B.3 Cable Pod 

Figure B-2 shows a portion of the XDS51 0 emulator cable pod. These are 
the functional features of the emulator pod: 

□ Signals TDO and TCK_RET can be parallel-terminated inside the pod 
if required by the application. The default is that these signals are not 
terminated. 

□ Signal TCK is driven with a 74AS1 034 device. Because of the high cur- 
rent drive (48 mA Iol/'oh)' this signal can be parallel terminated. If TCK 
is tied to TCK_RET, then you can use the parallel terminator in the pod. 

□ Signals TMS and TDI can be generated from the falling edge of 
TCK_RET, according to the IEEE 1149.1 bus slave device timing 
rules.They can also be driven from the rising edge of TCK_RET, which 
allows a higher TCKJRET frequency. The default is to match the IEEE 
1149.1 slave device timing rules. This is an emulator software option 
that can be selected when the emulator is invoked. In general, single- 
processor applications can benefit from the higher clock frequency. 
However, in multiprocessing applications, you may wish to use the IEEE 
1149.1 bus slave timing mode to minimize emulation system timing con- 
straints. 

□ Signals TMS and TDI are series terminated to reduce signal reflections. 

□ A 1 0-MHz test clock source is provided. You may also provide your own 
test clock for greater flexibility. 



B-4 



XDS510 Design Considerations 



Cable Pod 



Figure B-2. Emulator Pod Interface 



TOO (Pin 7) 



GND (Pins 4,6,8,10,12) 



EMUO (Pin 13) 
EMU1 (Pin 13 



TCK_RET (Pin 9) 
PD (Pin 5) 



i+5V 



180 ft > £ 270 ft 



JP1 



X 



+5V 



10 kQ 



;10kO 



A+5V 

180ft | £ 270ft 

<0 JP2 



100 ft 



74F175 



Q 

D 5 



74AS258 



33 ft 



74AS1034 



10 MHz 



74AS1034 
74AS1034 ^ 



74AS1004 



1 



CL 



74AS74 



TMS (Pin1) 
TDI (Pin 3) 

TCK (Pin 11) 



TRST (Pin 2) 



Figure B-3 and Table B-2 show the signal timings for the XDS51 0. Timing 
parameters are calculated from standard data sheet parts used in the cable 
pod. These timings are for reference only. Texas Instruments does not test 
or guarantee these timings. 

The emulator pod uses TCK_RET as it's clock source for internal synchroni- 
zation. TCK is provided as an optional target system test clock source. 
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Cable Pod 



Figure &-3. Emulator Pod Timings 



TCK RET 



TMS TDI (Default) 



TMS TDI (Optional) 



y 



« — 2- 



< —3 N 



a" 

k-4— J 



Of 



X 



TDO 



xzx 



Table B-2. Emulator Pod Timing Parameters 



No. 


Reference 


Description 


Min Max 


Unit 


1 


tTCKmln 
*TCKmax 


TCK_RET period 


35 200 


ns 


2 


tTCKhiahmin 


TCK_RET high pulse duration 


15 


ns 


3 


*TCKIowmin 


TCK_RET low pulse duration 


15 


ns 


4 


td(XTMXmin) 
td(XTMXmax) 


TMS/TDI valid from TCK_RET low (default timing) 


6 20 


ns 


5 


td(XTMSmin) 
*d)XTMSmax) 


TMS/TDI valid from TCK.RET high (optional timing) 


7 24 


ns 


6 


WXTDOmin) 


TDO setup time to TCKJtET high 


3 


ns 


7 


thql(XTPOmin) 


TDO hold time from TCKJRET high 


12 


ns 



It is extremely important to provide high-quality signals between the emula- 
tor and the target processor. If the distance between the emulation header 
and the processor is greater than 6 inches, the emulation signals should be 
buffered. Sections B.4 and B.5 illustrate typical connections between the 
target processor and the emulation header. 



B-6 



XDS510 Design Considerations 



Test Clock Generated in Target System 



B.4 Test Clock Generated in Target System 

Figure 4 shows an application with the system test clock generated in the 
target system. In this application the TCK signal is left unconnected. 

Figure B-4. Target-System Generated Test Clock 

Greater Than 

6 Inches 

+5V 



TMS320C4X 



EMUO 
EMU1 
TRST 
TMS 
TDI 
TDO 
TCK 



System Test Clock 



13 





. 14. 


i 


2 



< 



NC 



11 
9 



Emulator Header 

EMUO PD 
EMU1 



TRST 

TMS 

TDI 

TDO 

TCK 

TCK RET 



GND 
GND 
GND 
GND 
GND 



+5V 

A 



4 


6 




8 


-o 


10 


-o 


12 





GND 



There are two benefits to having the target system generate the test clock: 

1 ) You can set the test clock frequency to match your system require- 
ments. The emulator provides only a single 10-MHz test clock. 

2) You may have other devices in your system that require a test clock 
when the emulator is not connected. 
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B.5 Multiprocessor Configuration 

Figure B-5. Multiprocessor Connections 



TMS320C4X 


TDO 


TDI 


CO * 


fc § 5 




tt 2 2 




1— LLI LU 



TMS320C4X 


TDO 


TDI 


CO *c 


§ 5 




I 2 2 




1- UJ 111 



■Vr 



-Vr 



■Vr 



-Vr 



I 



— , +5V 




QND 



Figure B-5 shows a typical multiprocessor configuration. This is a daisy- 
chained configuration (TDO-TDI daisy-chained) that meets the minimum 
requirements of the IEEE 1 1 49.1 specification. The emulation signals in this 
example are buffered to isolate the processors from the emulator and pro- 
vide an adequate signal drive for the target system. One of the benefits of 
a JTAG test interface is that you can generally slow down the test clock to 
eliminate timing problems. Several key points to multiprocessor support are 
as follows: 

□ The processor TMS, TDI, TDO, and TCK should be buffered through 
the same physical package to better control timing skew. 

□ The input buffers for TMS, TDI, and TCK should have pullups to 5 volts. 
This will hold these signals at a known value when the emulator is not 
connected. A pullup of 4.7 k£2 or greater is suggested. 

□ Buffering EMUO and EMU1 is optional but highly recommended to pro- 
vide isolation. These are not critical signals and do not need to be buff- 
ered through the same physical package as TMS, TCK, TDI, and TDO. 
Unbuffered and buffered signals are shown in Figure B-6 and 
Figure B-7. 
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No signal buffering. In this situation, the distance between the header and 
the processor should be no more than 6 inches. 

Figure B-6. Unbuffered Signals 

— 6 Inches or Less — 
A+5V 



TMS320C4X 



EMUO 
EMU1 
TR5T 
TMS 
TDI 
TDO 
TCK 
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14 
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> 


2 


1 


3 


7 
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Emulator Header 



EMUO 

EMU1 

TRST 

TMS 

TDI 

TDO 

TCK 

TCK RET 



PD 

GND 
QND 
GND 
GND 
GND 



+5V 



4 


6 




8 


-o 


10 


-o 


12 


-o 



GND 



Emulation signals buffered. The distance between the emulation header 
and the processor is greater than 6 inches. The emulation signals — TMS, 
TDI, TDO, and TCK_RET — are buffered through the same package. 

Figure B-7. Buffered Signals 

Greater Than 
6 inches — 

+5V 



TMS320C4X 



EMUO 
EMU1 
TRST 
TMS 
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Emulator Header 



EMUO 
EMU1 



TRST 

TMS 

TDI 

TDO 

TCK 

TCK RET 



PD 

GND 
GND 
GND 
GND 
GND 



4 


6 




8 


-o 


10 


-o 


12 
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□ The EMUO and EMU1 signals must have pullups to 5 volts. The pullup 
resistor value should be chosen to provide a signal rise-time less than 
1 0 |as. A 4.7 kQ resistor is suggested for most applications. EMUO - 1 
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are I/O pins on the 'C4x; however, they are only inputs to the XDS51 0. 
In general, these pins are used in multiprocessor systems to provide 
global run/stop operations. 

□ It is extremely important to provide high quality signals, especially on 
the processor TCK and the emulator TCK_RET signal. In some cases, 
this may require you to provide special PWB trace routing and use 
termination resistors to match the trace impedance. The emulator pod 
does provide optional internal parallel terminators on the TCK_RET, 
and TDO. TMS and TDI provide fixed series termination. 
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B.6 Emulation Timing Calculations 

Following are a few examples on how to calculate the emulation timings in 
your system. For actual target timing parameters, see the appropriate de- 
vice data sheets. 

Assumptions: t SU (TTMS) Target TMS/TDI setup to TCK high 1 0 ns 

th(TTMS) Target TMS/TDI hold from TCK high 5 ns | 

tdflTDO) Target TDO delay from TCK low 1 5 ns 

td(bufmax) Target buffer delay maximum 1 0 ns 

td(bufmin) Target buffer delay minimum 1 ns 

t(bufskew) Target buffer skew between two devices 
in the same package: 

ftj(buf max) - tybufmin)] x °" 1 5 1 * 35 ns 

ttckfactor Assume a 40/60 duty cycle clock 0.4 

Given in Table B-2 (page B-6): 

td(XTMSmax) XDS510 TMS/TDI delay from TCK_RET 

low, maximum 20 ns 

td(XTMX) min XDS51 0 TMS/TDI delay from 

TCK_RET low, minimum 6 ns 

td(XTMSmax) XDS510 TMS/TDI delay from TCK_RET 

high, max 24 ns 

td(XTMXmin) XDS51 0 TMS/TDI delay from TCKJRET 

high, minimum 7 ns 

tsu(XTDOmin) TDO setup time to XDS510 TCK__RET 

high 3 ns 

There are two key timing paths to consider in the emulation design: 
the TCK_RET/TMS/TDI (t pr dtckJTMS) P^h, and 
the TCK.RET/TDO (tprdtckJTDO) Path. 

In each case, the worst case path delay is calculated to determine the maxi- 
mum system test clock frequency. 
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Case 1 : Single processor, direct connection, TMS/TDI timed from TCK_RET low 
(default timing). 

tprdtckJMS = [^(XTMSmax) + tsu(TTMS)] / ttckfactor 
= (20 ns + 10ns)/0.4 

- 75 ns (13.3 MHz) 

tprdtckJDO - [td(TTDO) + tsu(XTDOmin)] 1 tfckfactor 

- (15 ns + 3 ns)/0.4 

- 45 ns (22.2 MHz) 

In this case, the TCK/TMS path is the limiting factor. 

Case 2: Single processor, direct connection, TMS/TDI timed from TCKJRET high 
(optional timing). 

tprdtckJMS - td(XTMSmax) + t S u(TTMS) 

- (24 ns + 10 ns) 
« 34 ns (29.4 MHz) 

tprdtckJTDO = PdifTTDO) + t S u(XTDOmin)] / ttckfactor 
« (15 ns + 3 ns)/ 0.4 
= 45 ns (22.2 MHz) 

In this case, the TCK/TDO path is the limiting factor. One other thing to con- 
sider in this case is the TMS/TDI hold time. The minimum hold time for the 
XDS51 0 cable pod is 7 ns, which meets the 5-ns hold time of the target de- 
vice. 

Case 3: . Single/multiple processor, TMS/TDI buffered input; TCK_RET/TDO buff- 
ered output, TMS/TDI timed from TCK_RET high (optional timing). 

tprdtckJMS - t<j(XTMSmax) + tsu(TTMS) + 2 tdfbufmax) 

- 24 ns + 10 ns + 2 (10) 

- 54 ns (18.5 MHz) 

tprdtckJDO - td(TTDO) * WXTDOmin) + Wske w 

ttckfactor 

= (15 ns + 3 ns + 1.35 ns)/ 0.4 

- 58.4 ns (20.7 MHz) 

In this case, the TCK/TMS path is the limiting factor. The hold time on TMS/ 
TDI is also reduced by the buffer skew ^1 .35 ns) but still meets the minimum 
device hold time. 
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Case 4: Single/multiprocessor, TMS/TDI/TCK buffered input; TDO buffered output, 
TMS/TDI timed from TCK_RET low (default timing). 

tprdtckJTMS = td(XTMSmax) + t S u(TTMS) + *bufske w 

ttckfactor 

* (24 ns + 10 ns + 1.35 ns)/ 0.4 
= 88.4ns (11.3 MHz) 

tprdtckJDO - *d(TTDO) + WXTPOmin) + *d(bufma x) 

ttckfactor 

■ (15 ns + 3 ns + 10 ns)/ 0.4 

- 70 ns (14.3 MHz) 

In this case, the TCK/TMS path is the limiting factor. 

In a multiprocessor application, it is necessary to ensure that the EUM0-1 
lines can go from a logic low level to a logic high level in less than 1 0 |is. This 
can be calculated as follows (remember that t = 5 RC): 

trise - 5(Rp U ||yp x N^evjogs x Cioad^pe^devjee) 

- 5(4.7 kQ x 16 x 15 pF) 

- 5.64 ^is 
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Note: Primary sources are in boldface. 
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